diff --git a/postfix/HISTORY b/postfix/HISTORY index 59019d5b2..9cb1b12b5 100644 --- a/postfix/HISTORY +++ b/postfix/HISTORY @@ -13813,3 +13813,29 @@ Apologies for any names omitted. Bugfix (introduced: 20071004) missing exception handling in smtp-sink per-command delay feature. Victor Duchovni. File: smtpstone/smtp-sink.c. + +2007117-20 + + Revised queue manager with separate mechanisms for + per-destination concurrency control and dead destination + detection. The concurrency control supports non-integer + feedback for more gradual concurrency adjustments, and uses + hysteresis to avoid rapid oscillations. A destination is + declared "dead" after a configurable number of pseudo-cohorts + (number of deliveries equal to a destination's concurrency) + reports connection or handshake failure. This work began + with a discussion that Wietse started with Patrik Rak and + Victor Duchovni late January 2004, and that Victor revived + late October 2007. To establish a baseline for further + improvement, Wietse implemented a few simple mechanisms. + + Configuration parameters: qmgr_concurrency_feedback_debug, + qmgr_negative_concurrency_feedback_hysteresis, + qmgr_negative_concurrency_feedback_style, + qmgr_positive_concurrency_feedback_hysteresis, + qmgr_positive_concurrency_feedback_style, qmgr_sacrifice_cohorts. + See postconf(5) for detailed information. Right now, the + defaults are compatible with older Postfix versions. After + further review the number of parameters will be consolidated + and the defaults will select the better algorithms. Files: + qmgr/qmgr_queue.c, qmgr/qmgr_deliver.c. diff --git a/postfix/RELEASE_NOTES b/postfix/RELEASE_NOTES index 62f58225d..c9dff4ce1 100644 --- a/postfix/RELEASE_NOTES +++ b/postfix/RELEASE_NOTES @@ -17,7 +17,46 @@ Incompatibility with Postfix 2.3 and earlier If you upgrade from Postfix 2.3 or earlier, read RELEASE_NOTES-2.4 before proceeding. -Major changes with Postfix snapshot 20071110 +Major changes with Postfix snapshot 20071121 +============================================ + +Revised queue manager with separate mechanisms for per-destination +concurrency control and for dead destination detection. The +concurrency control supports non-integer feedback to allow for more +gradual concurrency adjustments, and uses hysteresis to avoid rapid +oscillations. A destination is declared "dead" after a configurable +number of pseudo-cohorts(*) reports connection or handshake failure. + +(*) A pseudo-cohort is a number of delivery requests equal to a + destination's delivery concurrency. + +The drawbacks of the old +/-1 feedback scheduler are a) overshoot +due to exponential delivery concurrency growth with each pseudo-cohort(*) +(5-10-20...); b) throttling down to zero concurrency after a single +pseudo-cohort(*) failure. The second problem was especially an issue +with low-concurrency channels where a single failure could be +sufficient to mark a destination as "dead", and suspend further +deliveries. + +The new code is a laboratory model with a multitude of configuration +parameters, so that developers can experiment with different feedback +functions and hysteresis values. This is a baseline against which +further improvements will be measured: a) is the additional improvement +worth the additional complexity; b) is the design sound, i.e. free +from arbitrary constants and other tweaks that optimize for a narrow +range of application. + +New main.cf parameters: qmgr_concurrency_feedback_debug, +qmgr_negative_feedback_hysteresis, qmgr_negative_feedback_method, +qmgr_positive_feedback_hysteresis, qmgr_positive_feedback_method, +qmgr_sacrifice_cohorts. See postconf(5) for extensive descriptions. + +The default parameter settings are backwards compatible with older +Postfix versions. However, after a testing period, the number of +parameters will be consolidated, and the default settings will be +changed to take advantage of the "better" algorithm. + +Major changes with Postfix snapshot 20071111 ============================================ Header/body checks are now available in the SMTP client, after the diff --git a/postfix/html/postconf.5.html b/postfix/html/postconf.5.html index 5cef8f2da..2d9054f51 100644 --- a/postfix/html/postconf.5.html +++ b/postfix/html/postconf.5.html @@ -5915,6 +5915,18 @@ This feature is available in Postfix 2.0 and later.

+ + +
qmgr_concurrency_feedback_debug +(default: no)
+ +

Make the queue manager's feedback algorithm verbose for performance +analysis purposes.

+ +

This feature is temporarily available in Postfix 2.5; its final +form is likely to change.

+ +
qmgr_fudge_factor @@ -5966,6 +5978,136 @@ parameter is 1.

+ + +
qmgr_negative_concurrency_feedback_hysteresis +(default: 1)
+ +

The per-destination integer amount of negative concurrency +feedback that must accumulate between negative adjustments of a +destination's delivery concurrency. The concurrency adjustment is +equal in size to the negative hysteresis value, and is applied at +the beginning of a cycle of (hysteresis / feedback) steps. +At that same time, the destination's positive feedback hysteresis +cycle is reset to its beginning.

+ +

This feature is temporarily available in Postfix 2.5; its final +form is likely to change. The default setting is compatible with +earlier Postfix versions.

+ + +
+ +
qmgr_negative_concurrency_feedback_style +(default: fixed_1)
+ +

The per-destination amount of negative delivery concurrency +feedback, after a delivery completes with a connection or handshake +failure.

+ +
+ +
inverse_concurrency
Variable feedback of +1 / (delivery concurrency). With this setting, and with +"qmgr_negative_concurrency_feedback_hysteresis = 1", the destination's +delivery concurrency is decremented by 1 after each failed +pseudo-cohort, and the destination is marked dead (further delivery +suspended) after the failed pseudo-cohort count reaches +$qmgr_sacrificial_cohorts.
+ +
inverse_sqrt_concurrency
Variable feedback +of 1 / (square root of delivery concurrency). This is an intermediate +form between the other two. It lacks sound justification, and is a +candidate for removal.
+ +
fixed_1
Constant feedback of 1. This setting +is compatible with Postfix versions before 2.5, where a destination's +delivery concurrency is throttled down to zero (and further delivery +suspended) after a single failed pseudo-cohort.
+ +
+ +

A pseudo-cohort is a number of deliveries equal to the destination's +delivery concurrency.

+ +

This feature is temporarily available in Postfix 2.5; its final +form is likely to change. The default setting is compatible with +earlier Postfix versions.

+ + +
+ +
qmgr_positive_concurrency_feedback_hysteresis +(default: 1)
+ +

The per-destination integer amount of positive concurrency +feedback that must accumulate before positive adjustments of a +destination's delivery concurrency. The concurrency adjustment is +equal in size to the positive hysteresis value, and is applied at +the end of a cycle of (hysteresis / feedback) steps. At that +same time, the destination's negative feedback hysteresis cycle is +reset to its beginning.

+ +

This feature is temporarily available in Postfix 2.5; its final +form is likely to change. The default setting is compatible with +earlier Postfix versions.

+ + +
+ +
qmgr_positive_concurrency_feedback_style +(default: fixed_1)
+ +

The per-destination amount of positive delivery concurrency +feedback, after a delivery completes without connection or handshake +failure.

+ +
+ +
inverse_concurrency
Variable feedback of +1 / (delivery concurrency). With this setting, and with +"qmgr_positive_concurrency_feedback_hysteresis = 1", the destination's +delivery concurrency is incremented by 1 after each successful +pseudo-cohort, until it reaches the per-destination maximal concurrency +limit.
+ +
inverse_sqrt_concurrency
Variable feedback +of 1 / (square root of delivery concurrency). This is an intermediate +form between the other two. It lacks sound justification, and is a +candidate for removal.
+ +
fixed_1
Constant feedback of 1. This setting +is compatible with Postfix versions before 2.5, where a destination's +delivery concurrency is doubled after each successful pseudo-cohort, +until it reaches the per-destination maximal concurrency limit. +
+ +
+ +

A pseudo-cohort is a number of deliveries equal to the destination's +delivery concurrency.

+ +

This feature is temporarily available in Postfix 2.5. The default +setting is compatible with earlier Postfix versions.

+ + +
+ +
qmgr_sacrificial_cohorts +(default: 1)
+ +

How many pseudo-cohorts must suffer connection or handshake +failure before a specific destination is considered unavailable +(and further delivery is suspended). A pseudo-cohort is a number +of deliveries equal to a destination's concurrency. The pseudo-cohort +failure count is reset each time a delivery completes without +connection or handshake failure for that specific destination.

+ +

This feature is temporarily available in Postfix 2.5; its final +form is likely to change. The default setting is compatible with +earlier Postfix versions.

+ +
qmqpd_authorized_clients diff --git a/postfix/html/qmgr.8.html b/postfix/html/qmgr.8.html index ce4106af8..245cffdd4 100644 --- a/postfix/html/qmgr.8.html +++ b/postfix/html/qmgr.8.html @@ -262,6 +262,38 @@ QMGR(8) QMGR(8) tion_concurrency_limit) Idem, for delivery via the named message transport. + qmgr_concurrency_feedback_debug (no) + Make the queue manager's feedback algorithm verbose + for performance analysis purposes. + + qmgr_negative_concurrency_feedback_hysteresis (1) + The per-destination integer amount of negative con- + currency feedback that must accumulate between neg- + ative adjustments of a destination's delivery con- + currency. + + qmgr_negative_concurrency_feedback_style (fixed_1) + The per-destination amount of negative delivery + concurrency feedback, after a delivery completes + with a connection or handshake failure. + + qmgr_positive_concurrency_feedback_hysteresis (1) + The per-destination integer amount of positive con- + currency feedback that must accumulate before posi- + tive adjustments of a destination's delivery con- + currency. + + qmgr_positive_concurrency_feedback_style (fixed_1) + The per-destination amount of positive delivery + concurrency feedback, after a delivery completes + without connection or handshake failure. + + qmgr_sacrificial_cohorts (1) + How many pseudo-cohorts must suffer connection or + handshake failure before a specific destination is + considered unavailable (and further delivery is + suspended). + RECIPIENT SCHEDULING CONTROLS default_destination_recipient_limit (50) The default maximal number of recipients per mes- @@ -305,21 +337,23 @@ QMGR(8) QMGR(8) Idem, for delivery via the named message transport. OTHER RESOURCE AND RATE CONTROLS - minimal_backoff_time (version dependent) + minimal_backoff_time (300s) The minimal time between attempts to deliver a - deferred message. + deferred message; prior to Postfix 2.4 the default + value was 1000s. maximal_backoff_time (4000s) - The maximal time between attempts to deliver a + The maximal time between attempts to deliver a deferred message. maximal_queue_lifetime (5d) - The maximal time a message is queued before it is + The maximal time a message is queued before it is sent back as undeliverable. - queue_run_delay (version dependent) - The time between deferred queue scans by the queue - manager. + queue_run_delay (300s) + The time between deferred queue scans by the queue + manager; prior to Postfix 2.4 the default value was + 1000s. transport_retry_time (60s) The time between attempts by the Postfix queue man- diff --git a/postfix/man/man5/postconf.5 b/postfix/man/man5/postconf.5 index d558846de..d39222ab1 100644 --- a/postfix/man/man5/postconf.5 +++ b/postfix/man/man5/postconf.5 @@ -3244,6 +3244,12 @@ clogging up the Postfix active queue. Specify 0 to disable. This feature is enabled with the helpful_warnings parameter. .PP This feature is available in Postfix 2.0 and later. +.SH qmgr_concurrency_feedback_debug (default: no) +Make the queue manager's feedback algorithm verbose for performance +analysis purposes. +.PP +This feature is temporarily available in Postfix 2.5; its final +form is likely to change. .SH qmgr_fudge_factor (default: 100) Obsolete feature: the percentage of delivery resources that a busy mail system will use up for delivery of a large mailing list @@ -3263,6 +3269,97 @@ takes priority over any other in-memory recipient limits (i.e., the global qmgr_message_recipient_limit and the per transport _recipient_limit) if necessary. The minimum value allowed for this parameter is 1. +.SH qmgr_negative_concurrency_feedback_hysteresis (default: 1) +The per-destination integer amount of negative concurrency +feedback that must accumulate between negative adjustments of a +destination's delivery concurrency. The concurrency adjustment is +equal in size to the negative hysteresis value, and is applied at +the \fBbeginning\fR of a cycle of (hysteresis / feedback) steps. +At that same time, the destination's positive feedback hysteresis +cycle is reset to its beginning. +.PP +This feature is temporarily available in Postfix 2.5; its final +form is likely to change. The default setting is compatible with +earlier Postfix versions. +.SH qmgr_negative_concurrency_feedback_style (default: fixed_1) +The per-destination amount of negative delivery concurrency +feedback, after a delivery completes with a connection or handshake +failure. +.IP "\fB inverse_concurrency \fR" +Variable feedback of +1 / (delivery concurrency). With this setting, and with +"qmgr_negative_concurrency_feedback_hysteresis = 1", the destination's +delivery concurrency is decremented by 1 after each failed +pseudo-cohort, and the destination is marked dead (further delivery +suspended) after the failed pseudo-cohort count reaches +$qmgr_sacrificial_cohorts. +.IP "\fB inverse_sqrt_concurrency \fR" +Variable feedback +of 1 / (square root of delivery concurrency). This is an intermediate +form between the other two. It lacks sound justification, and is a +candidate for removal. +.IP "\fB fixed_1 \fR" +Constant feedback of 1. This setting +is compatible with Postfix versions before 2.5, where a destination's +delivery concurrency is throttled down to zero (and further delivery +suspended) after a single failed pseudo-cohort. +.PP +A pseudo-cohort is a number of deliveries equal to the destination's +delivery concurrency. +.PP +This feature is temporarily available in Postfix 2.5; its final +form is likely to change. The default setting is compatible with +earlier Postfix versions. +.SH qmgr_positive_concurrency_feedback_hysteresis (default: 1) +The per-destination integer amount of positive concurrency +feedback that must accumulate before positive adjustments of a +destination's delivery concurrency. The concurrency adjustment is +equal in size to the positive hysteresis value, and is applied at +the \fBend\fR of a cycle of (hysteresis / feedback) steps. At that +same time, the destination's negative feedback hysteresis cycle is +reset to its beginning. +.PP +This feature is temporarily available in Postfix 2.5; its final +form is likely to change. The default setting is compatible with +earlier Postfix versions. +.SH qmgr_positive_concurrency_feedback_style (default: fixed_1) +The per-destination amount of positive delivery concurrency +feedback, after a delivery completes without connection or handshake +failure. +.IP "\fB inverse_concurrency \fR" +Variable feedback of +1 / (delivery concurrency). With this setting, and with +"qmgr_positive_concurrency_feedback_hysteresis = 1", the destination's +delivery concurrency is incremented by 1 after each successful +pseudo-cohort, until it reaches the per-destination maximal concurrency +limit. +.IP "\fB inverse_sqrt_concurrency \fR" +Variable feedback +of 1 / (square root of delivery concurrency). This is an intermediate +form between the other two. It lacks sound justification, and is a +candidate for removal. +.IP "\fB fixed_1 \fR" +Constant feedback of 1. This setting +is compatible with Postfix versions before 2.5, where a destination's +delivery concurrency is doubled after each successful pseudo-cohort, +until it reaches the per-destination maximal concurrency limit. +.PP +A pseudo-cohort is a number of deliveries equal to the destination's +delivery concurrency. +.PP +This feature is temporarily available in Postfix 2.5. The default +setting is compatible with earlier Postfix versions. +.SH qmgr_sacrificial_cohorts (default: 1) +How many pseudo-cohorts must suffer connection or handshake +failure before a specific destination is considered unavailable +(and further delivery is suspended). A pseudo-cohort is a number +of deliveries equal to a destination's concurrency. The pseudo-cohort +failure count is reset each time a delivery completes without +connection or handshake failure for that specific destination. +.PP +This feature is temporarily available in Postfix 2.5; its final +form is likely to change. The default setting is compatible with +earlier Postfix versions. .SH qmqpd_authorized_clients (default: empty) What clients are allowed to connect to the QMQP server port. .PP diff --git a/postfix/man/man8/qmgr.8 b/postfix/man/man8/qmgr.8 index ca129d64d..5e6f55dd4 100644 --- a/postfix/man/man8/qmgr.8 +++ b/postfix/man/man8/qmgr.8 @@ -235,6 +235,29 @@ The default maximal number of parallel deliveries to the same destination. .IP "\fItransport\fB_destination_concurrency_limit ($default_destination_concurrency_limit)\fR" Idem, for delivery via the named message \fItransport\fR. +.IP "\fBqmgr_concurrency_feedback_debug (no)\fR" +Make the queue manager's feedback algorithm verbose for performance +analysis purposes. +.IP "\fBqmgr_negative_concurrency_feedback_hysteresis (1)\fR" +The per-destination integer amount of negative concurrency +feedback that must accumulate between negative adjustments of a +destination's delivery concurrency. +.IP "\fBqmgr_negative_concurrency_feedback_style (fixed_1)\fR" +The per-destination amount of negative delivery concurrency +feedback, after a delivery completes with a connection or handshake +failure. +.IP "\fBqmgr_positive_concurrency_feedback_hysteresis (1)\fR" +The per-destination integer amount of positive concurrency +feedback that must accumulate before positive adjustments of a +destination's delivery concurrency. +.IP "\fBqmgr_positive_concurrency_feedback_style (fixed_1)\fR" +The per-destination amount of positive delivery concurrency +feedback, after a delivery completes without connection or handshake +failure. +.IP "\fBqmgr_sacrificial_cohorts (1)\fR" +How many pseudo-cohorts must suffer connection or handshake +failure before a specific destination is considered unavailable +(and further delivery is suspended). .SH "RECIPIENT SCHEDULING CONTROLS" .na .nf @@ -274,15 +297,17 @@ Idem, for delivery via the named message \fItransport\fR. .nf .ad .fi -.IP "\fBminimal_backoff_time (version dependent)\fR" -The minimal time between attempts to deliver a deferred message. +.IP "\fBminimal_backoff_time (300s)\fR" +The minimal time between attempts to deliver a deferred message; +prior to Postfix 2.4 the default value was 1000s. .IP "\fBmaximal_backoff_time (4000s)\fR" The maximal time between attempts to deliver a deferred message. .IP "\fBmaximal_queue_lifetime (5d)\fR" The maximal time a message is queued before it is sent back as undeliverable. -.IP "\fBqueue_run_delay (version dependent)\fR" -The time between deferred queue scans by the queue manager. +.IP "\fBqueue_run_delay (300s)\fR" +The time between deferred queue scans by the queue manager; +prior to Postfix 2.4 the default value was 1000s. .IP "\fBtransport_retry_time (60s)\fR" The time between attempts by the Postfix queue manager to contact a malfunctioning message delivery transport. diff --git a/postfix/mantools/postlink b/postfix/mantools/postlink index cf1586db4..8f7b43c47 100755 --- a/postfix/mantools/postlink +++ b/postfix/mantools/postlink @@ -334,6 +334,14 @@ while (<>) { s;\bqmgr_message_recip[-]*\n* *[]*ient_limit\b;$&;g; s;\bqmgr_message_recip[-]*\n* *[]*ient_minimum\b;$&;g; s;\bqmqpd_authorized_clients\b;$&;g; + + s;\bqmgr_negative_concurrency_feedback_hysteresis\b;$&;g; + s;\bqmgr_negative_concurrency_feedback_style\b;$&;g; + s;\bqmgr_positive_concurrency_feedback_hysteresis\b;$&;g; + s;\bqmgr_positive_concurrency_feedback_style\b;$&;g; + s;\bqmgr_sacrificial_cohorts\b;$&;g; + s;\bqmgr_concurrency_feedback_debug\b;$&;g; + s;\bqmqpd_error_delay\b;$&;g; s;\bqmqpd_timeout\b;$&;g; s;\bqueue_directory\b;$&;g; diff --git a/postfix/proto/postconf.proto b/postfix/proto/postconf.proto index ce8fda2f1..42daf4e33 100644 --- a/postfix/proto/postconf.proto +++ b/postfix/proto/postconf.proto @@ -10674,3 +10674,121 @@ that change the delivery time or destination are not available.

This feature is available in Postfix 2.5 and later.

+ +%PARAM qmgr_concurrency_feedback_debug no + +

Make the queue manager's feedback algorithm verbose for performance +analysis purposes.

+ +

This feature is temporarily available in Postfix 2.5; its final +form is likely to change.

+ +%PARAM qmgr_sacrificial_cohorts 1 + +

How many pseudo-cohorts must suffer connection or handshake +failure before a specific destination is considered unavailable +(and further delivery is suspended). A pseudo-cohort is a number +of deliveries equal to a destination's concurrency. The pseudo-cohort +failure count is reset each time a delivery completes without +connection or handshake failure for that specific destination.

+ +

This feature is temporarily available in Postfix 2.5; its final +form is likely to change. The default setting is compatible with +earlier Postfix versions.

+ +%PARAM qmgr_negative_concurrency_feedback_hysteresis 1 + +

The per-destination integer amount of negative concurrency +feedback that must accumulate between negative adjustments of a +destination's delivery concurrency. The concurrency adjustment is +equal in size to the negative hysteresis value, and is applied at +the beginning of a cycle of (hysteresis / feedback) steps. +At that same time, the destination's positive feedback hysteresis +cycle is reset to its beginning.

+ +

This feature is temporarily available in Postfix 2.5; its final +form is likely to change. The default setting is compatible with +earlier Postfix versions.

+ +%PARAM qmgr_positive_concurrency_feedback_hysteresis 1 + +

The per-destination integer amount of positive concurrency +feedback that must accumulate before positive adjustments of a +destination's delivery concurrency. The concurrency adjustment is +equal in size to the positive hysteresis value, and is applied at +the end of a cycle of (hysteresis / feedback) steps. At that +same time, the destination's negative feedback hysteresis cycle is +reset to its beginning.

+ +

This feature is temporarily available in Postfix 2.5; its final +form is likely to change. The default setting is compatible with +earlier Postfix versions.

+ +%PARAM qmgr_negative_concurrency_feedback_style fixed_1 + +

The per-destination amount of negative delivery concurrency +feedback, after a delivery completes with a connection or handshake +failure.

+ +
+ +
inverse_concurrency
Variable feedback of +1 / (delivery concurrency). With this setting, and with +"qmgr_negative_concurrency_feedback_hysteresis = 1", the destination's +delivery concurrency is decremented by 1 after each failed +pseudo-cohort, and the destination is marked dead (further delivery +suspended) after the failed pseudo-cohort count reaches +$qmgr_sacrificial_cohorts.
+ +
inverse_sqrt_concurrency
Variable feedback +of 1 / (square root of delivery concurrency). This is an intermediate +form between the other two. It lacks sound justification, and is a +candidate for removal.
+ +
fixed_1
Constant feedback of 1. This setting +is compatible with Postfix versions before 2.5, where a destination's +delivery concurrency is throttled down to zero (and further delivery +suspended) after a single failed pseudo-cohort.
+ +
+ +

A pseudo-cohort is a number of deliveries equal to the destination's +delivery concurrency.

+ +

This feature is temporarily available in Postfix 2.5; its final +form is likely to change. The default setting is compatible with +earlier Postfix versions.

+ +%PARAM qmgr_positive_concurrency_feedback_style fixed_1 + +

The per-destination amount of positive delivery concurrency +feedback, after a delivery completes without connection or handshake +failure.

+ +
+ +
inverse_concurrency
Variable feedback of +1 / (delivery concurrency). With this setting, and with +"qmgr_positive_concurrency_feedback_hysteresis = 1", the destination's +delivery concurrency is incremented by 1 after each successful +pseudo-cohort, until it reaches the per-destination maximal concurrency +limit.
+ +
inverse_sqrt_concurrency
Variable feedback +of 1 / (square root of delivery concurrency). This is an intermediate +form between the other two. It lacks sound justification, and is a +candidate for removal.
+ +
fixed_1
Constant feedback of 1. This setting +is compatible with Postfix versions before 2.5, where a destination's +delivery concurrency is doubled after each successful pseudo-cohort, +until it reaches the per-destination maximal concurrency limit. +
+ +
+ +

A pseudo-cohort is a number of deliveries equal to the destination's +delivery concurrency.

+ +

This feature is temporarily available in Postfix 2.5. The default +setting is compatible with earlier Postfix versions.

diff --git a/postfix/src/global/mail_params.h b/postfix/src/global/mail_params.h index 7b9110107..35b4a5558 100644 --- a/postfix/src/global/mail_params.h +++ b/postfix/src/global/mail_params.h @@ -2830,6 +2830,39 @@ extern char *var_smtp_body_chks; #define VAR_LMTP_BODY_CHKS "lmtp_body_checks" #define DEF_LMTP_BODY_CHKS "" + /* + * Scheduler concurrency feedback algorithms. + */ +#define VAR_QMGR_POS_FDBACK "qmgr_positive_concurrency_feedback_style" +#define DEF_QMGR_POS_FDBACK QMGR_FDBACK_NAME_FIXED_1 +extern char *var_qmgr_pos_feedback; + +#define VAR_QMGR_NEG_FDBACK "qmgr_negative_concurrency_feedback_style" +#define DEF_QMGR_NEG_FDBACK QMGR_FDBACK_NAME_FIXED_1 +extern char *var_qmgr_neg_feedback; + +#define QMGR_FDBACK_NAME_FIXED_1 "fixed_1" +#define QMGR_FDBACK_NAME_INVERSE_1 "inverse_1" /* deprecated */ +#define QMGR_FDBACK_NAME_INVERSE_WIN "inverse_concurrency" +#define QMGR_FDBACK_NAME_INV_SQRT "inverse_sqrt" /* deprecated */ +#define QMGR_FDBACK_NAME_INV_SQRT_WIN "inverse_sqrt_concurrency" + +#define VAR_QMGR_POS_HYST "qmgr_positive_concurrency_feedback_hysteresis" +#define DEF_QMGR_POS_HYST 1 +extern int var_qmgr_pos_hysteresis; + +#define VAR_QMGR_NEG_HYST "qmgr_negative_concurrency_feedback_hysteresis" +#define DEF_QMGR_NEG_HYST 1 +extern int var_qmgr_neg_hysteresis; + +#define VAR_QMGR_SAC_COHORTS "qmgr_sacrificial_cohorts" +#define DEF_QMGR_SAC_COHORTS 1 +extern int var_qmgr_sac_cohorts; + +#define VAR_QMGR_FDBACK_DEBUG "qmgr_concurrency_feedback_debug" +#define DEF_QMGR_FDBACK_DEBUG 0 +extern bool var_qmgr_feedback_debug; + /* LICENSE /* .ad /* .fi diff --git a/postfix/src/global/mail_version.h b/postfix/src/global/mail_version.h index b4d627ed7..51f2f6868 100644 --- a/postfix/src/global/mail_version.h +++ b/postfix/src/global/mail_version.h @@ -20,7 +20,7 @@ * Patches change both the patchlevel and the release date. Snapshots have no * patchlevel; they change the release date only. */ -#define MAIL_RELEASE_DATE "20071111" +#define MAIL_RELEASE_DATE "20071121" #define MAIL_VERSION_NUMBER "2.5" #ifdef SNAPSHOT diff --git a/postfix/src/qmgr/Makefile.in b/postfix/src/qmgr/Makefile.in index d7db1a073..ea62bdecc 100644 --- a/postfix/src/qmgr/Makefile.in +++ b/postfix/src/qmgr/Makefile.in @@ -14,7 +14,7 @@ CFLAGS = $(DEBUG) $(OPT) $(DEFS) TESTPROG= PROG = qmgr INC_DIR = ../../include -LIBS = ../../lib/libmaster.a ../../lib/libglobal.a ../../lib/libutil.a +LIBS = ../../lib/libmaster.a ../../lib/libglobal.a ../../lib/libutil.a -lm .c.o:; $(CC) $(CFLAGS) -c $*.c @@ -290,6 +290,7 @@ qmgr_queue.o: ../../include/htable.h qmgr_queue.o: ../../include/mail_params.h qmgr_queue.o: ../../include/msg.h qmgr_queue.o: ../../include/mymalloc.h +qmgr_queue.o: ../../include/name_code.h qmgr_queue.o: ../../include/recipient_list.h qmgr_queue.o: ../../include/scan_dir.h qmgr_queue.o: ../../include/sys_defs.h diff --git a/postfix/src/qmgr/qmgr.c b/postfix/src/qmgr/qmgr.c index 4cc999671..c932d1c2a 100644 --- a/postfix/src/qmgr/qmgr.c +++ b/postfix/src/qmgr/qmgr.c @@ -205,6 +205,29 @@ /* destination. /* .IP "\fItransport\fB_destination_concurrency_limit ($default_destination_concurrency_limit)\fR" /* Idem, for delivery via the named message \fItransport\fR. +/* .IP "\fBqmgr_concurrency_feedback_debug (no)\fR" +/* Make the queue manager's feedback algorithm verbose for performance +/* analysis purposes. +/* .IP "\fBqmgr_negative_concurrency_feedback_hysteresis (1)\fR" +/* The per-destination integer amount of negative concurrency +/* feedback that must accumulate between negative adjustments of a +/* destination's delivery concurrency. +/* .IP "\fBqmgr_negative_concurrency_feedback_style (fixed_1)\fR" +/* The per-destination amount of negative delivery concurrency +/* feedback, after a delivery completes with a connection or handshake +/* failure. +/* .IP "\fBqmgr_positive_concurrency_feedback_hysteresis (1)\fR" +/* The per-destination integer amount of positive concurrency +/* feedback that must accumulate before positive adjustments of a +/* destination's delivery concurrency. +/* .IP "\fBqmgr_positive_concurrency_feedback_style (fixed_1)\fR" +/* The per-destination amount of positive delivery concurrency +/* feedback, after a delivery completes without connection or handshake +/* failure. +/* .IP "\fBqmgr_sacrificial_cohorts (1)\fR" +/* How many pseudo-cohorts must suffer connection or handshake +/* failure before a specific destination is considered unavailable +/* (and further delivery is suspended). /* RECIPIENT SCHEDULING CONTROLS /* .ad /* .fi @@ -238,15 +261,17 @@ /* OTHER RESOURCE AND RATE CONTROLS /* .ad /* .fi -/* .IP "\fBminimal_backoff_time (version dependent)\fR" -/* The minimal time between attempts to deliver a deferred message. +/* .IP "\fBminimal_backoff_time (300s)\fR" +/* The minimal time between attempts to deliver a deferred message; +/* prior to Postfix 2.4 the default value was 1000s. /* .IP "\fBmaximal_backoff_time (4000s)\fR" /* The maximal time between attempts to deliver a deferred message. /* .IP "\fBmaximal_queue_lifetime (5d)\fR" /* The maximal time a message is queued before it is sent back as /* undeliverable. -/* .IP "\fBqueue_run_delay (version dependent)\fR" -/* The time between deferred queue scans by the queue manager. +/* .IP "\fBqueue_run_delay (300s)\fR" +/* The time between deferred queue scans by the queue manager; +/* prior to Postfix 2.4 the default value was 1000s. /* .IP "\fBtransport_retry_time (60s)\fR" /* The time between attempts by the Postfix queue manager to contact /* a malfunctioning message delivery transport. @@ -390,6 +415,12 @@ int var_local_rcpt_lim; int var_proc_limit; bool var_verp_bounce_off; int var_qmgr_clog_warn_time; +char *var_qmgr_pos_feedback; +char *var_qmgr_neg_feedback; +int var_qmgr_pos_hysteresis; +int var_qmgr_neg_hysteresis; +int var_qmgr_sac_cohorts; +int var_qmgr_feedback_debug; static QMGR_SCAN *qmgr_scans[2]; @@ -614,6 +645,11 @@ static void qmgr_post_init(char *name, char **unused_argv) qmgr_scans[QMGR_SCAN_IDX_DEFERRED] = qmgr_scan_create(MAIL_QUEUE_DEFERRED); qmgr_scan_request(qmgr_scans[QMGR_SCAN_IDX_INCOMING], QMGR_SCAN_START); qmgr_deferred_run_event(0, (char *) 0); + + /* + * Scheduler initialization. + */ + qmgr_queue_feedback_init(); } MAIL_VERSION_STAMP_DECLARE; @@ -624,6 +660,8 @@ int main(int argc, char **argv) { static CONFIG_STR_TABLE str_table[] = { VAR_DEFER_XPORTS, DEF_DEFER_XPORTS, &var_defer_xports, 0, 0, + VAR_QMGR_POS_FDBACK, DEF_QMGR_POS_FDBACK, &var_qmgr_pos_feedback, 1, 0, + VAR_QMGR_NEG_FDBACK, DEF_QMGR_NEG_FDBACK, &var_qmgr_neg_feedback, 1, 0, 0, }; static CONFIG_TIME_TABLE time_table[] = { @@ -654,11 +692,15 @@ int main(int argc, char **argv) VAR_LOCAL_RCPT_LIMIT, DEF_LOCAL_RCPT_LIMIT, &var_local_rcpt_lim, 0, 0, VAR_LOCAL_CON_LIMIT, DEF_LOCAL_CON_LIMIT, &var_local_con_lim, 0, 0, VAR_PROC_LIMIT, DEF_PROC_LIMIT, &var_proc_limit, 1, 0, + VAR_QMGR_POS_HYST, DEF_QMGR_POS_HYST, &var_qmgr_pos_hysteresis, 1, 0, + VAR_QMGR_NEG_HYST, DEF_QMGR_NEG_HYST, &var_qmgr_neg_hysteresis, 1, 0, + VAR_QMGR_SAC_COHORTS, DEF_QMGR_SAC_COHORTS, &var_qmgr_sac_cohorts, 1, 0, 0, }; static CONFIG_BOOL_TABLE bool_table[] = { VAR_ALLOW_MIN_USER, DEF_ALLOW_MIN_USER, &var_allow_min_user, VAR_VERP_BOUNCE_OFF, DEF_VERP_BOUNCE_OFF, &var_verp_bounce_off, + VAR_QMGR_FDBACK_DEBUG, DEF_QMGR_FDBACK_DEBUG, &var_qmgr_feedback_debug, 0, }; diff --git a/postfix/src/qmgr/qmgr.h b/postfix/src/qmgr/qmgr.h index 6392e098f..7490e842e 100644 --- a/postfix/src/qmgr/qmgr.h +++ b/postfix/src/qmgr/qmgr.h @@ -198,6 +198,9 @@ struct QMGR_QUEUE { int todo_refcount; /* queue entries (todo list) */ int busy_refcount; /* queue entries (busy list) */ int window; /* slow open algorithm */ + double success; /* cumulative positive feedback */ + double failure; /* cumulative negative feedback */ + double fail_cohorts; /* pseudo-cohort failure count */ QMGR_TRANSPORT *transport; /* transport linkage */ QMGR_ENTRY_LIST todo; /* todo queue entries */ QMGR_ENTRY_LIST busy; /* messages on the wire */ @@ -217,6 +220,7 @@ extern void qmgr_queue_done(QMGR_QUEUE *); extern void qmgr_queue_throttle(QMGR_QUEUE *, DSN *); extern void qmgr_queue_unthrottle(QMGR_QUEUE *); extern QMGR_QUEUE *qmgr_queue_find(QMGR_TRANSPORT *, const char *); +extern void qmgr_queue_feedback_init(void); #define QMGR_QUEUE_THROTTLED(q) ((q)->window <= 0) diff --git a/postfix/src/qmgr/qmgr_deliver.c b/postfix/src/qmgr/qmgr_deliver.c index 346562897..36c2a7b6b 100644 --- a/postfix/src/qmgr/qmgr_deliver.c +++ b/postfix/src/qmgr/qmgr_deliver.c @@ -317,9 +317,11 @@ static void qmgr_deliver_update(int unused_event, char *context) if (VSTRING_LEN(dsb->reason) == 0) vstring_strcpy(dsb->reason, "unknown error"); vstring_prepend(dsb->reason, SUSPENDED, sizeof(SUSPENDED) - 1); - qmgr_queue_throttle(queue, DSN_FROM_DSN_BUF(dsb)); - if (queue->window == 0) - qmgr_defer_todo(queue, &dsb->dsn); + if (queue->window > 0) { + qmgr_queue_throttle(queue, DSN_FROM_DSN_BUF(dsb)); + if (queue->window == 0) + qmgr_defer_todo(queue, &dsb->dsn); + } } } diff --git a/postfix/src/qmgr/qmgr_queue.c b/postfix/src/qmgr/qmgr_queue.c index 414de5836..4d86aea2f 100644 --- a/postfix/src/qmgr/qmgr_queue.c +++ b/postfix/src/qmgr/qmgr_queue.c @@ -50,8 +50,9 @@ /* transport. A null result means that the queue was not found. /* /* qmgr_queue_throttle() handles a delivery error, and decrements the -/* concurrency limit for the destination. When the concurrency limit -/* for a destination becomes zero, qmgr_queue_throttle() starts a timer +/* concurrency limit for the destination, with a lower bound of 1. +/* When the cohort failure bound is reached, qmgr_queue_throttle() +/* sets the concurrency limit to zero and starts a timer /* to re-enable delivery to the destination after a configurable delay. /* /* qmgr_queue_unthrottle() undoes qmgr_queue_throttle()'s effects. @@ -71,7 +72,7 @@ /* P.O. Box 704 /* Yorktown Heights, NY 10598, USA /* -/* Scheduler enhancements: +/* Pre-emptive scheduler enhancements: /* Patrik Rak /* Modra 6 /* 155 00, Prague, Czech Republic @@ -81,6 +82,7 @@ #include #include +#include /* Utility library. */ @@ -88,11 +90,13 @@ #include #include #include +#include /* Global library. */ #include #include +#include /* QMGR_LOG_WINDOW */ /* Application-specific. */ @@ -100,6 +104,81 @@ int qmgr_queue_count; + /* + * Lookup tables for main.cf feedback method names. + */ +#define QMGR_FDBACK_CODE_BAD 0 +#define QMGR_FDBACK_CODE_FIXED_1 1 +#define QMGR_FDBACK_CODE_INVERSE_WIN 2 +#define QMGR_FDBACK_CODE_INVERSE_1 QMGR_FDBACK_CODE_INVERSE_WIN +#define QMGR_FDBACK_CODE_INV_SQRT_WIN 3 +#define QMGR_FDBACK_CODE_INV_SQRT QMGR_FDBACK_CODE_INV_SQRT_WIN + +NAME_CODE qmgr_feedback_map[] = { + QMGR_FDBACK_NAME_FIXED_1, QMGR_FDBACK_CODE_FIXED_1, + QMGR_FDBACK_NAME_INVERSE_WIN, QMGR_FDBACK_CODE_INVERSE_WIN, + QMGR_FDBACK_NAME_INVERSE_1, QMGR_FDBACK_CODE_INVERSE_1, + QMGR_FDBACK_NAME_INV_SQRT_WIN, QMGR_FDBACK_CODE_INV_SQRT_WIN, + QMGR_FDBACK_NAME_INV_SQRT, QMGR_FDBACK_CODE_INV_SQRT, + 0, QMGR_FDBACK_CODE_BAD, +}; +static int qmgr_pos_feedback_idx; +static int qmgr_neg_feedback_idx; + + /* + * Choosing the right feedback method at run-time. + */ +#define QMGR_FEEDBACK_VAL(idx, window) ( \ + (idx) == QMGR_FDBACK_CODE_INVERSE_1 ? (1.0 / (window)) : \ + (idx) == QMGR_FDBACK_CODE_FIXED_1 ? (1.0) : \ + (1.0 / sqrt(window)) \ + ) + +#define QMGR_ERROR_OR_RETRY_QUEUE(queue) \ + (strcmp(queue->transport->name, MAIL_SERVICE_RETRY) == 0 \ + || strcmp(queue->transport->name, MAIL_SERVICE_ERROR) == 0) + +#define QMGR_LOG_FEEDBACK(feedback) \ + if (var_qmgr_feedback_debug && !QMGR_ERROR_OR_RETRY_QUEUE(queue)) \ + msg_info("%s: feedback %g", myname, feedback); + +#define QMGR_LOG_WINDOW(queue) \ + if (var_qmgr_feedback_debug && !QMGR_ERROR_OR_RETRY_QUEUE(queue)) \ + msg_info("%s: queue %s: limit %d window %d success %g failure %g fail_cohorts %g", \ + myname, queue->name, queue->transport->dest_concurrency_limit, \ + queue->window, queue->success, queue->failure, queue->fail_cohorts); + +/* qmgr_queue_feedback_init - initialize feedback selection */ + +void qmgr_queue_feedback_init(void) +{ + + /* + * Positive and negative feedback method indices. + */ + qmgr_pos_feedback_idx = name_code(qmgr_feedback_map, NAME_CODE_FLAG_NONE, + var_qmgr_pos_feedback); + if (qmgr_pos_feedback_idx == QMGR_FDBACK_CODE_BAD) + msg_fatal("%s: bad feedback method: %s", + VAR_QMGR_POS_FDBACK, var_qmgr_pos_feedback); + if (var_qmgr_feedback_debug) + msg_info("positive feedback method %d, value at %d: %g", + qmgr_pos_feedback_idx, var_init_dest_concurrency, + QMGR_FEEDBACK_VAL(qmgr_pos_feedback_idx, + var_init_dest_concurrency)); + + qmgr_neg_feedback_idx = name_code(qmgr_feedback_map, NAME_CODE_FLAG_NONE, + var_qmgr_neg_feedback); + if (qmgr_neg_feedback_idx == QMGR_FDBACK_CODE_BAD) + msg_fatal("%s: bad feedback method: %s", + VAR_QMGR_NEG_FDBACK, var_qmgr_neg_feedback); + if (var_qmgr_feedback_debug) + msg_info("negative feedback method %d, value at %d: %g", + qmgr_neg_feedback_idx, var_init_dest_concurrency, + QMGR_FEEDBACK_VAL(qmgr_neg_feedback_idx, + var_init_dest_concurrency)); +} + /* qmgr_queue_unthrottle_wrapper - in case (char *) != (struct *) */ static void qmgr_queue_unthrottle_wrapper(int unused_event, char *context) @@ -122,10 +201,21 @@ void qmgr_queue_unthrottle(QMGR_QUEUE *queue) { const char *myname = "qmgr_queue_unthrottle"; QMGR_TRANSPORT *transport = queue->transport; + double feedback; + double multiplier; if (msg_verbose) msg_info("%s: queue %s", myname, queue->name); + /* + * Don't restart the negative feedback hysteresis cycle with every + * positive feedback. Restart it only when we make a positive concurrency + * adjustment (i.e. at the end of a positive feedback hysteresis cycle). + * Otherwise negative feedback would be too aggressive: negative feedback + * takes effect immediately at the start of its hysteresis cycle. + */ + queue->fail_cohorts = 0; + /* * Special case when this site was dead. */ @@ -135,7 +225,13 @@ void qmgr_queue_unthrottle(QMGR_QUEUE *queue) msg_panic("%s: queue %s: window 0 status 0", myname, queue->name); dsn_free(queue->dsn); queue->dsn = 0; - queue->window = transport->init_dest_concurrency; + /* Back from the almost grave, best concurrency is anyone's guess. */ + if (queue->busy_refcount > 0) + queue->window = queue->busy_refcount; + else + queue->window = transport->init_dest_concurrency; + queue->success = queue->failure = 0; + QMGR_LOG_WINDOW(queue); return; } @@ -143,11 +239,35 @@ void qmgr_queue_unthrottle(QMGR_QUEUE *queue) * Increase the destination's concurrency limit until we reach the * transport's concurrency limit. Allow for a margin the size of the * initial destination concurrency, so that we're not too gentle. + * + * Why is the concurrency increment based on preferred concurrency and not + * on the number of outstanding delivery requests? The latter fluctuates + * wildly when deliveries complete in bursts (artificial benchmark + * measurements), and does not account for cached connections. + * + * Keep the window within reasonable distance from actual concurrency + * otherwise negative feedback will be ineffective. This expression + * assumes that busy_refcount changes gradually. This is invalid when + * deliveries complete in bursts (artificial benchmark measurements). */ if (transport->dest_concurrency_limit == 0 || transport->dest_concurrency_limit > queue->window) - if (queue->window < queue->busy_refcount + transport->init_dest_concurrency) - queue->window++; + if (queue->window < queue->busy_refcount + transport->init_dest_concurrency) { + feedback = QMGR_FEEDBACK_VAL(qmgr_pos_feedback_idx, queue->window); + QMGR_LOG_FEEDBACK(feedback); + queue->success += feedback; + /* Prepare for overshoot (feedback > hysteresis, rounding error). */ + while (queue->success >= var_qmgr_pos_hysteresis) { + queue->window += var_qmgr_pos_hysteresis; + queue->success -= var_qmgr_pos_hysteresis; + queue->failure = 0; + } + /* Prepare for overshoot. */ + if (transport->dest_concurrency_limit > 0 + && queue->window > transport->dest_concurrency_limit) + queue->window = transport->dest_concurrency_limit; + } + QMGR_LOG_WINDOW(queue); } /* qmgr_queue_throttle - handle destination delivery failure */ @@ -155,6 +275,7 @@ void qmgr_queue_unthrottle(QMGR_QUEUE *queue) void qmgr_queue_throttle(QMGR_QUEUE *queue, DSN *dsn) { const char *myname = "qmgr_queue_throttle"; + double feedback; /* * Sanity checks. @@ -167,13 +288,43 @@ void qmgr_queue_throttle(QMGR_QUEUE *queue, DSN *dsn) myname, queue->name, dsn->status, dsn->reason); /* - * Decrease the destination's concurrency limit until we reach zero, at - * which point the destination is declared dead. Decrease the concurrency - * limit by one, instead of using actual concurrency - 1, to avoid - * declaring a host dead after just one single delivery failure. + * Don't restart the positive feedback hysteresis cycle with every + * negative feedback. Restart it only when we make a negative concurrency + * adjustment (i.e. at the start of a negative feedback hysteresis + * cycle). Otherwise positive feedback would be too weak (positive + * feedback does not take effect until the end of its hysteresis cycle). */ - if (queue->window > 0) - queue->window--; + + /* + * This queue is declared dead after a configurable number of + * pseudo-cohort failures. + */ + if (queue->window > 0) { + queue->fail_cohorts += 1.0 / queue->window; + if (queue->fail_cohorts >= var_qmgr_sac_cohorts) + queue->window = 0; + } + + /* + * Decrease the destination's concurrency limit until we reach 1. Base + * adjustments on the concurrency limit itself, instead of using the + * actual concurrency. The latter fluctuates wildly when deliveries + * complete in bursts (artificial benchmark measurements). + */ + if (queue->window > 1) { + feedback = QMGR_FEEDBACK_VAL(qmgr_neg_feedback_idx, queue->window); + QMGR_LOG_FEEDBACK(feedback); + queue->failure -= feedback; + /* Prepare for overshoot (feedback > hysteresis, rounding error). */ + while (queue->failure < 0) { + queue->window -= var_qmgr_neg_hysteresis; + queue->success = 0; + queue->failure += var_qmgr_neg_hysteresis; + } + /* Prepare for overshoot. */ + if (queue->window < 1) + queue->window = 1; + } /* * Special case for a site that just was declared dead. @@ -184,6 +335,7 @@ void qmgr_queue_throttle(QMGR_QUEUE *queue, DSN *dsn) (char *) queue, var_min_backoff_time); queue->dflags = 0; } + QMGR_LOG_WINDOW(queue); } /* qmgr_queue_done - delete in-core queue for site */ @@ -241,6 +393,7 @@ QMGR_QUEUE *qmgr_queue_create(QMGR_TRANSPORT *transport, const char *name, queue->busy_refcount = 0; queue->transport = transport; queue->window = transport->init_dest_concurrency; + queue->success = queue->failure = queue->fail_cohorts = 0; QMGR_LIST_INIT(queue->todo); QMGR_LIST_INIT(queue->busy); queue->dsn = 0;