{"id":642,"date":"2010-12-29T06:32:03","date_gmt":"2010-12-28T22:32:03","guid":{"rendered":"http:\/\/www.computersolutions.cn\/blog\/?p=642"},"modified":"2010-12-29T06:49:38","modified_gmt":"2010-12-28T22:49:38","slug":"debian-system-load-monitoring","status":"publish","type":"post","link":"https:\/\/www.computersolutions.cn\/blog\/2010\/12\/debian-system-load-monitoring\/","title":{"rendered":"Debian System Load monitoring"},"content":{"rendered":"<p>One or two of our servers have been a little bit overloaded recently. <\/p>\n<p>They&#8217;re going to be replaced with beefier machines, but due to a number of issues I haven&#8217;t been able to replace them yet. <\/p>\n<p>Issue #1 &#8211; Pre expo, we weren&#8217;t allowed to replace anything.<br \/>\nIssue #2 &#8211; Post Expo, I&#8217;m no longer allowed in the data center! <\/p>\n<p>We&#8217;re working on sorting issue #2 out, but in the interim I need to keep the older machines running.<\/p>\n<p>I was previously using Monit to monitor system load.<\/p>\n<p>Monit would be a good solution &#8211; it has a web ui, it can stop services if system load goes too high, and generally works when everything else is failed. This is great when things go poopy, but it has one fatal issue.<\/p>\n<p>It doesn&#8217;t know how to restart stuff if load is back to normal.<br \/>\nThis typically means that something will put the server load into unusability for a sustained period of time (due to lots of visitors), monit will go ooh, apache has gone awol, and stop it.<br \/>\nUnfortunately if its back to normal, monit doesn&#8217;t have a way to start it up again, so I need to manually go to the monit page, and start the service.  I do get emailed on things like this, but it leads to complaints from the 2 clients that appear to monitor their particular websites more than monit does.  <\/p>\n<p>So, I&#8217;ve been looking at other solutions.<\/p>\n<p>One such solution is sysfence<\/p>\n<p>While sysfence is severly underdocumented, the examples provided don&#8217;t even work!, and appears to be abandoned, it does do the job.<br \/>\nSysfence is a no bells and no whistles precursor to monit, but it has that killer feature that monit is missing.<\/p>\n<p>So, how do we use sysfence?<\/p>\n<p><code>apt-get install sysfence<\/code><\/p>\n<p>Will install it, but unfortunately no config is installed.<\/p>\n<p>So, start off by creating a \/etc\/sysfence folder<\/p>\n<p><code>mkdir \/etc\/sysfence<br \/>\ncd \/etc\/sysfence<\/code><\/p>\n<p>We&#8217;ll need to create a config file for it, so <\/p>\n<p><code>pico sysfence.conf<\/code><\/p>\n<p>My sample sysfence script is below (explanation underneath script)<\/p>\n<p><code><br \/>\nrule \"ApacheStop\" {<br \/>\n   la1 >= 10.00 or la5 >= 6.0<br \/>\n}<br \/>\nrun '\/etc\/init.d\/apache2 stop;'<br \/>\n<\/code><br \/>\n<code><br \/>\nrule \"ApacheStart\" {<br \/>\n la1 <=2  \n}\nrun once  '\/etc\/init.d\/apache2 start;'\n<\/code><br \/>\n<code><br \/>\nrule \"warning\" { la1 >= 8.00 } run once 'echo \"Load High: BACKUP\"  | mail lawrence@computersolutions.cn'<br \/>\n<\/code><\/p>\n<p>I'm having issues with apache causing load to rocket, so I've setup some rules as follows:<\/p>\n<p>If load average for 1 minute > 10 (ie server is going bonkers), and load average for the last 5 minutes > 6 then stop apache.<br \/>\nif load average for 1 minute > 8 send me an email.<br \/>\nif load average for 1 minute < 2 then start apache. This will only run one time if load is below 2.  \n\nThe documentation <a href=\"http:\/\/sysfence.sourceforge.net\/\">http:\/\/sysfence.sourceforge.net\/<\/a> goes over how to write a rule.  Note that the examples are broken;  <\/p>\n<p>eg<br \/>\n<code>if {<br \/>\n    la1 >= 8.00<br \/>\n} run once 'echo \"SHOW FULL PROCESSLIST\" | mysql | mail my@email.com'<br \/>\n<\/code><\/p>\n<p>Issue?  All rules need to have a \"rule name\" specified.<\/p>\n<p>So a corrected working version would be:<\/p>\n<p><code>if  \"some rule\" {<br \/>\n    la1 >= 8.00<br \/>\n} run once 'echo \"SHOW FULL PROCESSLIST\" | mysql | mail my@email.com'<br \/>\n<\/code><\/p>\n<p>Back to our setup..<\/p>\n<p>Now we've setup a ruleset, we need to run it.  Calling  <code>sysfence \/etc\/sysfence.conf<\/code><\/p>\n<p>Will run it as a daemon.<\/p>\n<p><code>ps -ef<\/code> shows our rulesets running:<\/p>\n<blockquote><p>root      7260     1  0 05:51 ?        00:00:01 sffetch<br \/>\nroot      7261  7260  0 05:51 ?        00:00:00 sfwatch 'warning'<br \/>\nroot      7262  7260  0 05:51 ?        00:00:00 sfwatch 'ApacheStop'<br \/>\nroot      7263  7260  0 05:51 ?        00:00:00 sfwatch 'ApacheStart'     <\/p><\/blockquote>\n<p>sffetch is the daemon, and sfwatch are the rules it runs.<\/p>\n<p>As sysfence is quite rudimentary, you'll need to kill it if you change rules.  <\/p>\n<p>You'll also need to add it to your startup scripts or create one.  I'll be lazy and not go over that right now.  If people are interested, add a comment, and I'll put something up.<\/p>\n<p>Sysfence can be downloaded here - <a href=\"http:\/\/sysfence.sourceforge.net\/\">http:\/\/sysfence.sourceforge.net\/<\/a> (or via apt-get if on a Debian based OS)<\/p>\n<hr width=400>\n<p>Man page for sysfence below (note examples require adding \"rulename\" after <code>if... { <\/code> or <code>rule ...  {<\/code>):<\/p>\n<h2>NAME<\/h2>\n<div class=\"part\">\n<p>sysfence - system resources guard for Linux\n<\/p>\n<\/div>\n<h2>SYNOPSIS<\/h2>\n<div class=\"part\">\n<p><b>sysfence<\/b><br \/>\n&lt;<i>configuration file<\/i>&gt; [&lt;<i>configuration file<\/i>&gt; ...]<\/p>\n<\/div>\n<h2>DESCRIPTION<\/h2>\n<div class=\"part\">\n<p><b>Sysfence<\/b> is a resource monitoring tool designed for Linux machines.<br \/>\nWhile running as daemon it checks resource levels and makes desired<br \/>\naction if some values exceed safety limits.\n<\/p>\n<p>\nSysfence can be used for notifying system administrators when something<br \/>\ngoes wrong, stopping services when system performance is dropping too<br \/>\nlow and starting them when it's going up again, periodically restarting<br \/>\nmemory-leaking processes, dumping system statistics in critical situations.\n<\/p>\n<p>\nSysfence can monitor following resource levels: load average, used and<br \/>\nfree memory amount, used and free swap space.\n<\/p>\n<\/div>\n<h2>USAGE<\/h2>\n<div class=\"part\">\n<p>Sysfence reads it's configuration from file(s) specified in argument<br \/>\nlist. Config files may contain one or more rules describing conditions<br \/>\nand actions to be performed.\n<\/p>\n<p>Rule has syntax like this:\n<\/p>\n<p>   if {<br \/>\n      resource1 &gt; limit1<br \/>\n      or<br \/>\n      { resource2 &lt; limit2 and resource3 &lt; limit3 }<br \/>\n   }<br \/>\n   run once 'command-to-be-run'<\/p>\n<p>The block enclosed within {} brackets describes condition. When it's<br \/>\nresult is TRUE, following command is invoked.\n<\/p>\n<p>The once keyword is optional. If present, the command is executed only<br \/>\nonce after condition becomes TRUE. Next execution will take place only<br \/>\nif condition becomes FALSE and then TRUE again. Without once keyword,<br \/>\ncommand is invoked periodically, after every resource check that gives<br \/>\nTRUE, no matter what was the condition result before.\n<\/p>\n<p>Command specified right after run keyword is passed to \/bin\/sh, so it<br \/>\nmay contain more than one instruction or even whole script. But be<br \/>\ncareful - rule checking is suspended unless command execution has been<br \/>\ncompleted! (Other rules are unaffected.)\n<\/p>\n<p>As resources, following ones can be given:<\/p>\n<dl>\n<dt><b>la1<\/b>\n<\/dt>\n<dd>- load average during last minute.\n<\/dd>\n<dt><b>la5<\/b>\n<\/dt>\n<dd>- load average during last 5 minutes.\n<\/dd>\n<dt><b>la15<\/b>\n<\/dt>\n<dd>- load average during last 15 minutes.\n<\/dd>\n<dt><b>memfree<\/b>\n<\/dt>\n<dd>- lower limit for free memory amount.\n<\/dd>\n<dt><b>memused<\/b><\/p>\n<\/dt>\n<dd>- upper limit for memory used by processes.\n<\/dd>\n<dt><b>swapfree<\/b>\n<\/dt>\n<dd>- lower limit for free swap space.\n<\/dd>\n<dt><b>swapused<\/b>\n<\/dt>\n<dd>- upper limit for swap space in use.\n<\/dd>\n<\/dl>\n<\/div>\n<h2>EXAMPLES<\/h2>\n<div class=\"part\">\n<p>Do you have problems with MySQL server choking and freezing whole<br \/>\nsystem? I do. To find queries that cause problems, you may use:\n<\/p>\n<p>if {<br \/>\n    la1 &gt;= 8.00<br \/>\n} run once 'echo \"SHOW FULL PROCESSLIST\" | mysql | mail my@email.com'\n<\/p>\n<p>Of course, that wouldn't prevent your system from being blocked, but<br \/>\nfollowing rule could. MySQL will be restarted if LA for last minute<br \/>\nis over 10.0 or LA for last five minutes is over 6.0.\n<\/p>\n<p>if { la1 &gt;= 10.00 or la5 &gt;= 6.0 }<br \/>\nrun '\/etc\/rc.d\/init.d\/mysql stop; sleep 120; \/etc\/rc.d\/init.d\/mysql<br \/>\nstart'\n<\/p>\n<p>We may also restart some services that probably have memory leaks and<br \/>\nuse lots of swap space if not restarted periodically. Let's assume<br \/>\nthat 256MB of used swap is enough to give our Zope server a break.\n<\/p>\n<p>if {<br \/>\n    swapused &gt;= 256M<br \/>\n} run '\/etc\/rc.d\/init.d\/zope restart'<\/p>\n<p>We may also alert admins... Notice that you don't have to be r00t:\n<\/p>\n<p>if {<br \/>\n    la15 &gt; 4.0<br \/>\n    and<br \/>\n    {<br \/>\n        swapfree &lt; 64M<br \/>\n        or<br \/>\n        memfree &lt; 128M<br \/>\n    }<br \/>\n} run 'echo \"i wish you were here...\" | sendsms +48ADMINCELLPHONE'\n<\/p>\n<p>Using sysfence version 0.7 or later you may give rule a name that will<br \/>\nbe used in logs:\n<\/p>\n<p>rule \"high load\" { la1 &gt; 3.0 and la15 &gt; 2.0 } log\n<\/p>\n<p>rule keyword has the same meaning as if. There are also synonymes for<br \/>\nother keywords. Detailed list is included within sysfence package.\n<\/p>\n<p>You can find an example config file in <i>\/usr\/share\/doc\/sysfence\/example.conf<\/i>.<\/p>\n<\/div>\n<h2>AUTHOR<\/h2>\n<div class=\"part\">\n<p>Sysfence was written by Michal Saban (emes at pld-linux org) and<br \/>\nMirek Kopertowski (m.kopertowski at post pl)\n<\/p>\n<p>This manual page was created by Lukasz Jachowicz &lt;honey@debian.org&gt;,<br \/>\nfor the Debian project (but may be used by others). It is based on<br \/>\nthe http:\/\/sysfence.sf.net\/ page.\n<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>One or two of our servers have been a little bit overloaded recently. They&#8217;re going to be replaced with beefier machines, but due to a number of issues I haven&#8217;t been able to replace them yet. Issue #1 &#8211; Pre expo, we weren&#8217;t allowed to replace anything. Issue #2 &#8211; Post Expo, I&#8217;m no longer [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[25],"tags":[258,304,302,303],"class_list":["post-642","post","type-post","status-publish","format-standard","hentry","category-technical-mumbo-jumbo","tag-debian","tag-load","tag-monit","tag-sysfence"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts\/642","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/comments?post=642"}],"version-history":[{"count":11,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts\/642\/revisions"}],"predecessor-version":[{"id":653,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts\/642\/revisions\/653"}],"wp:attachment":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/media?parent=642"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/categories?post=642"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/tags?post=642"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}