{"id":355,"date":"2010-02-25T12:29:49","date_gmt":"2010-02-25T04:29:49","guid":{"rendered":"http:\/\/www.computersolutions.cn\/blog\/?p=355"},"modified":"2010-02-25T13:15:30","modified_gmt":"2010-02-25T05:15:30","slug":"of-qmail-zombies-and-qmail-remote-timeout-issues","status":"publish","type":"post","link":"https:\/\/www.computersolutions.cn\/blog\/2010\/02\/of-qmail-zombies-and-qmail-remote-timeout-issues\/","title":{"rendered":"Of Qmail, Zombies and qmail-remote timeout issues."},"content":{"rendered":"<p>Occasionally even in a well maintained system, qmail has issues.<\/p>\n<p>One semi-common issue I get to see, is when a server we send mail to doesn&#8217;t timeout. This ties up an outgoing mail slot.  Over a period of time, this can lead to issues where the whole outgoing or incoming queue is sitting doing nothing, as every connection is tied up by &#8216;tarpitted&#8217; connections.<\/p>\n<p>Ideally Qmail should be able to cope with these.\u00a0 There are settings in qmail to control how long a connection takes, and how long it should wait for.\u00a0 These settings are covered in the following files (usually set in \/var\/qmail\/control)<\/p>\n<p><!--more--><\/p>\n<blockquote><p>timeoutconnect &#8211; how long for qmail to wait on initial outgoing connection before trying another mail server.<br \/>\ntimeoutremote &#8211; how long to wait before timing out a connected outgoing server.<br \/>\ntimeoutsmtpd &#8211; how long for qmail to wait before dropping an incoming connection.<\/p><\/blockquote>\n<p>In our system, we set these values to:<br \/>\n30 seconds for timeoutconnect<br \/>\n600 seconds for timeoutremote<br \/>\n360 seconds for timeoutsmtpd<\/p>\n<p>In theory timeoutremote should see qmail drop a connection after 10 minutes (600 seconds).<br \/>\nIn practice, <strong>qmail doesn&#8217;t<\/strong>.<\/p>\n<p>Why?<\/p>\n<p>timeoutremote <strong>only<\/strong> applies if the connection hasn&#8217;t received any data for the timeout period.<br \/>\n<em>It doesn&#8217;t apply to the connection time as a whole<\/em>.<br \/>\nIf the remote end sends some data, the timeout is reset again, and it will wait again for the timeoutremote period.   If the remote server dribbles back an ACK or similar once every few minutes, then it can keep a connection alive for as long as it wants.<\/p>\n<p>This may not happen very often, but it can happen enough to tie up our connection queue over a period of time. I&#8217;ve seen connections go on for as long as days or weeks in practice.<\/p>\n<p>Ideally one should be able to set a proper timeout period in qmail which it adheres to, so that any connection over a certain time period gets killed, or at least set something up in ucspi-tcp, however thats something for another time.<\/p>\n<p>Here is a real world example.  <\/p>\n<p>I&#8217;ve run my kill zombie script in test mode (see bottom of page for the script)<\/p>\n<p><code>\/var\/qmail\/bin\/kill-qmail-smtpd-zombies --test<br \/>\n**Running in TEST mode**<br \/>\nRunning:  ps ax -o etime,pid,comm --no-heading | grep qmail-remote | grep ':[0-9][0-9]:' | awk '{print }'<br \/>\n-=-=-=-=-=-=-=-=-=-=-<br \/>\nFound zombies, setting up shotgun.<br \/>\nKilling qmail-remote zombies<br \/>\nkill -9 26707<br \/>\n-=-=-=-=-=-=-=-=-=-=-<\/code><\/p>\n<p>Its come up with a connection thats been running longer than an hour. &#8211; 26707<\/p>\n<p>I&#8217;ll double check to see that its correct<\/p>\n<p><code>ps ax -o etime,pid,comm | grep 26707<br \/>\n   01:39:07 26707 qmail-remote<br \/>\n<\/code><\/p>\n<p>Yup, qmail-remote has been running for 1hr39minutes on that connection.<\/p>\n<p>Lets check what the connection is<\/p>\n<p><code>ps -ef | grep 26707<br \/>\nroot      2964 17112  0 13:01 pts\/2    00:00:00 grep 26707<br \/>\nqmailr   26707 21959  0 11:23 ?        00:00:00 qmail-remote bamboo.sz.js.cn  zhangbin@bamboo.sz.js.cn<br \/>\n<\/code><\/p>\n<p>Hmm, its a known troublesome server <strong>bamboo.sz.js.cn<\/strong>.<br \/>\nIn fact, its the one that caused me to write this article!<\/p>\n<p>Lets watch whats actually happening in real time.<\/p>\n<p><code>strace -p 26707<br \/>\nProcess 26707 attached - interrupt to quit<br \/>\nread(3, <\/code><\/p>\n<p>[wait for a minute or two&#8230;]<\/p>\n<p>Still nothing.<\/p>\n<p>Hmm, sitting there waiting for a response to a read.  Guess what happens before the timeout period?<br \/>\nYup, we receive some more characters just in time to keep the connection up and running&#8230;<\/p>\n<p>We could set the timeoutremote to a lower number, but we do actually have cases where servers genuinely are slow on responses for various spam testing reasons (although they usually pickup speed once they pass those tests), so I prefer another method.<\/p>\n<p>Whats my current (lazy in lieu of patching qmail or ucspi-tcp) solution for this?<\/p>\n<p>A culling the zombies script!<\/p>\n<p>To install in your qmail\/bin folder, do the following:<\/p>\n<p><code lang=\"bash\"><br \/>\ncd \/var\/qmail\/bin<br \/>\nwget http:\/\/www.computersolutions.cn\/blog\/wp-content\/uploads\/2010\/02\/kill-qmail-zombies.txt<br \/>\nmv kill-qmail-zombies.txt kill-qmail-zombies.sh<br \/>\nchmod 0700 kill-qmail-zombies.sh<br \/>\n<\/code><\/p>\n<p>The script has a help file built in,  parameters are:<br \/>\n<code lang=\"bash\">.\/kill-qmail-zombies.sh<br \/>\n--test - Run in test mode (zombie friendly)<br \/>\n--help - Show the help<br \/>\n--force - Kill some zombies!<\/code><\/p>\n<p>eg<\/p>\n<p><code lang=\"bash\">.\/kill-qmail-zombies.sh --test<\/code><\/p>\n<p>You could set this to run every few hours in a cron script, but I <strong>strongly<\/strong> suggest you test first to see if it works correctly.  See the help file for more info on that.<\/p>\n<p>Script below for those who want to take a look.  Its one of my first shell scripts, so feel free to laugh, and comment accordingly!<\/p>\n<pre lang=\"bash\">\r\n#!\/bin\/sh\r\n\r\n# ===========================\r\n# qmail zombie killer script\r\n# Version: 1.0\r\n# Author: L. Sheed\r\n# Company: Computer Solutions\r\n# URL: http:\/\/www.computersolutions.cn\r\n# ===========================\r\n\r\nPATH=\/usr\/bin:\/bin\r\n\r\nfunction short_usage\r\n{\r\ncat &lt;&lt;- _EOF_\r\n$0: missing parameter\r\nTry '$0 --help' for more information.\r\n\r\n_EOF_\r\n}\r\n\r\nfunction usage\r\n{\r\ncat &lt;&lt;- _EOF_\r\nParameters:\r\n--force  kill qmail-smtpd and qmail-send processes (aka zombies) older than 1 hour\r\n--test \t do a test run (no zombie processes will be harmed)\r\n--help   show this help page\r\n\r\nNotes:\r\nStrongly suggest test first to see if the ps line works correct on your system before killing any processes!\r\neg -  Run the ps below on your system, and see if the output looks similar\r\n\r\nps ax -o etime,pid,comm --no-heading | grep qmail-smtp\r\n      04:40  6468 qmail-smtpd\r\n      01:47  7473 qmail-smtpd\r\n      01:00  8142 qmail-smtpd\r\n      01:00  8143 qmail-smtpd\r\n      00:46  8235 qmail-smtpd\r\n      00:36  8283 qmail-smtpd\r\n      00:19  8391 qmail-smtpd\r\n      00:11  8445 qmail-smtpd\r\n      00:07  8494 qmail-smtpd\r\n\r\n_EOF_\r\n}\r\n\r\nfunction zap_the_bastards\r\n{\r\nPLIST=`ps ax -o etime,pid,comm --no-heading | grep $WHAT | grep ':[0-9][0-9]:' | awk '{print $2}'`\r\n\r\n#In test mode, show what would be called also\r\nif [ \"$test\" = \"1\" ]; then\r\n\techo \"Running:  ps ax -o etime,pid,comm --no-heading | grep $WHAT | grep ':[0-9][0-9]:' | awk '{print $2}'\"\r\nfi\r\n\r\nif [ -n \"${PLIST:-}\" ]\r\nthen\r\n\techo \"-=-=-=-=-=-=-=-=-=-=-\"\r\n\techo \"Found zombies, setting up shotgun.\"\r\n\techo \"Killing $WHAT zombies\"\r\n\tfor p in $PLIST\r\n\tdo\r\n\t\tif [ \"$force\" = \"1\" ]; then\r\n\t\t\techo \"Kabooom:\"\r\n\t\t\tkill -9 $p\r\n\t\tfi\r\n\t\techo \"kill -9 $p\"\r\n\tdone\r\n\techo \"-=-=-=-=-=-=-=-=-=-=-\"\r\nelse\r\n\techo \"Good news everybody.  No $WHAT zombies found.\"\r\nfi\r\n}\r\n\r\n## Main\r\n\r\n#parse our parameters\r\nif [ ! $# == 1 ]; then\r\n\tshort_usage\r\n\texit\r\nfi\r\n\r\nwhile [ \"$1\" != \"\" ]; do\r\n case $1 in\r\n        --force )\r\n        echo \"**Running in FORCE mode**\"\r\n        force=1\r\n        ;;\r\n        --help )\r\n        usage\r\n        exit\r\n        ;;\r\n\t--test )\r\n\techo \"**Running in TEST mode**\"\r\n\ttest=1\r\n\t;;\r\n esac\r\nshift\r\ndone\r\n\r\n#do the deed\r\ntargets=( \"qmail-remote\" \"qmail-smtpd\" )\r\n\r\nfor target in ${targets[@]}\r\ndo\r\n\tWHAT=$target\r\n\tzap_the_bastards\r\ndone<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Occasionally even in a well maintained system, qmail has issues. One semi-common issue I get to see, is when a server we send mail to doesn&#8217;t timeout. This ties up an outgoing mail slot. Over a period of time, this can lead to issues where the whole outgoing or incoming queue is sitting doing nothing, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[73,25],"tags":[193,192,533,190,191,188],"class_list":["post-355","post","type-post","status-publish","format-standard","hentry","category-email","category-technical-mumbo-jumbo","tag-connection","tag-long","tag-qmail","tag-qmail-remote","tag-timeout","tag-zombies"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts\/355","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/comments?post=355"}],"version-history":[{"count":13,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts\/355\/revisions"}],"predecessor-version":[{"id":364,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts\/355\/revisions\/364"}],"wp:attachment":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/media?parent=355"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/categories?post=355"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/tags?post=355"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}