{"id":845,"date":"2012-05-18T22:19:06","date_gmt":"2012-05-18T14:19:06","guid":{"rendered":"http:\/\/www.computersolutions.cn\/blog\/?p=845"},"modified":"2012-05-18T22:19:06","modified_gmt":"2012-05-18T14:19:06","slug":"msn-bing-crawler-spider-madness","status":"publish","type":"post","link":"https:\/\/www.computersolutions.cn\/blog\/2012\/05\/msn-bing-crawler-spider-madness\/","title":{"rendered":"MSN \/ Bing crawler spider madness."},"content":{"rendered":"<p>One of our colo clients was complaining that his site was slow.<\/p>\n<p>Took a look, and although load was only slightly above normal, he was doing a substantial amount of traffic throughput.<\/p>\n<p>As he has multiple *busy* sites on his server, it was easier to take a look at iftop to see what was leeching the most traffic.<\/p>\n<p>I could immediately see that he had a couple of spiders indexing one of his sites.<br \/>\nNormally this isn&#8217;t a huge issue, as they usually place nicely.<\/p>\n<p>In this case, it seemed to be multiple connections from spiders.<br \/>\nOur robots.txt looked something like this &#8211; <\/p>\n<p><code>cat robots.txt<\/p>\n<p>User-agent: *<br \/>\nDisallow: \/members\/<br \/>\nDisallow: \/activity\/<br \/>\nDisallow: \/ko\/<br \/>\nDisallow: \/fr\/<br \/>\nDisallow: \/zh\/<br \/>\nDisallow: \/th\/<br \/>\nDisallow: \/vi\/<br \/>\nDisallow: \/th\/<br \/>\nDisallow: \/es\/<br \/>\nDisallow: \/ja\/<br \/>\nDisallow: \/it\/<br \/>\nDisallow: \/ru\/<br \/>\nDisallow: \/ar\/<br \/>\nDisallow: \/fi\/<br \/>\nDisallow: \/pl\/<br \/>\nDisallow: \/nl\/<br \/>\nDisallow: \/pt\/<br \/>\nDisallow: \/he\/<br \/>\nDisallow: \/no\/<br \/>\nDisallow: \/sv\/<br \/>\nDisallow: \/zh-tw\/<br \/>\nDisallow: \/cs\/<br \/>\nDisallow: \/de\/<br \/>\nDisallow: \/uk\/<br \/>\nDisallow: \/el\/<br \/>\nDisallow: \/tr\/<br \/>\n<\/code><\/p>\n<p>A quick check of site logs filtered by spiders showed that the majority of the traffic was coming from Bing \/ MSN.<\/p>\n<p>There were at least 10 &#8211; 15 simultaneous spiders indexing.  Not only that, but Bing \/ MSN was busy indexing all the lovely pages that we&#8217;d explicitly excluded in the sites robots.txt file.<\/p>\n<p>*and* it was downloading the robots.txt file, then totally ignoring it.<\/p>\n<p><code>207.46.195.241 - - [18\/May\/2012:03:15:04 +0000] \"GET \/robots.txt HTTP\/1.1\" 404 280 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n207.46.195.241 - - [18\/May\/2012:03:15:57 +0000] \"GET \/ HTTP\/1.1\" 500 975 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n207.46.195.241 - - [18\/May\/2012:03:17:29 +0000] \"GET \/ HTTP\/1.0\" 500 1534 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n207.46.195.241 - - [18\/May\/2012:04:17:30 +0000] \"GET \/ HTTP\/1.0\" 500 1534 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n207.46.195.234 - - [18\/May\/2012:12:57:16 +0000] \"GET \/no\/members\/kxtjanio\/activity\/ HTTP\/1.1\" 200 14555 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.196 - - [18\/May\/2012:12:57:16 +0000] \"GET \/tr\/members\/poluden\/activity\/groups\/?acpage=14 HTTP\/1.1\" 200 17927 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.196 - - [18\/May\/2012:12:57:16 +0000] \"GET \/zh-tw\/members\/bluezat\/forums\/ HTTP\/1.1\" 200 14633 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.196 - - [18\/May\/2012:12:57:18 +0000] \"GET \/ar\/members\/filozofem\/activity\/1643 HTTP\/1.1\" 200 12348 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.196 - - [18\/May\/2012:12:57:18 +0000] \"GET \/pt\/members\/chwacker\/activity\/groups\/ HTTP\/1.1\" 200 18675 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.196 - - [18\/May\/2012:12:57:20 +0000] \"GET \/ru\/members\/maklare\/points\/points\/ HTTP\/1.1\" 200 14945 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.196 - - [18\/May\/2012:12:57:21 +0000] \"GET \/pl\/members\/ken0115\/activity\/groups\/?acpage=7 HTTP\/1.1\" 200 17936 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.196 - - [18\/May\/2012:12:57:21 +0000] \"GET \/zh\/members\/halilfree82\/activity\/favorites\/ HTTP\/1.1\" 200 15677 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.196 - - [18\/May\/2012:12:57:23 +0000] \"GET \/zh\/members\/elwe\/activity\/mentions\/ HTTP\/1.1\" 200 15670 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.196 - - [18\/May\/2012:12:57:23 +0000] \"GET \/no\/members\/afsaneh\/activity\/friends\/?acpage=5 HTTP\/1.1\" 200 17101 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.196 - - [18\/May\/2012:12:57:25 +0000] \"GET \/tr\/members\/ahuy\/activity\/friends\/ HTTP\/1.1\" 200 17458 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.196 - - [18\/May\/2012:12:57:25 +0000] \"GET \/members\/zarus\/activity\/groups\/?acpage=3 HTTP\/1.1\" 200 18131 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.196 - - [18\/May\/2012:12:57:26 +0000] \"GET \/ko\/members\/daniel\/activity\/friends\/?acpage=4 HTTP\/1.1\" 200 18598 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.108.69 - - [18\/May\/2012:12:57:30 +0000] \"GET \/fr\/members\/poluden\/activity\/ HTTP\/1.1\" 200 15480 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.151 - - [18\/May\/2012:12:57:31 +0000] \"GET \/fr\/activity\/?acpage=14 HTTP\/1.1\" 200 17299 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.151 - - [18\/May\/2012:12:57:31 +0000] \"GET \/zh\/members\/zarus\/activity\/groups\/blog.letsfx.com?acpage=1 HTTP\/1.1\" 200 19401 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.151 - - [18\/May\/2012:12:57:33 +0000] \"GET \/th\/members\/chwacker\/activity\/friends\/?acpage=2 HTTP\/1.1\" 200 16651 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.110.198 - - [18\/May\/2012:12:57:34 +0000] \"GET \/vi\/members\/natvp\/activity\/groups\/?acpage=5 HTTP\/1.1\" 200 20382 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.110.198 - - [18\/May\/2012:12:57:34 +0000] \"GET \/th\/members\/helen\/activity\/groups\/?acpage=3 HTTP\/1.1\" 200 18099 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:35 +0000] \"GET \/groups\/bep-study-buddies\/activity\/2971\/ HTTP\/1.1\" 200 13887 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:35 +0000] \"GET \/ko\/members\/ahmedv\/blogs\/ HTTP\/1.1\" 200 15091 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:42 +0000] \"GET \/fr\/members\/cluadiomasu\/points\/ HTTP\/1.1\" 200 14172 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:42 +0000] \"GET \/ja\/members\/shengmao8620\/groups\/ HTTP\/1.1\" 200 15219 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:45 +0000] \"GET \/es\/blog\/tag\/presentations-2\/ HTTP\/1.1\" 200 17037 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:45 +0000] \"GET \/it\/members\/phuong.vo\/activity\/groups\/ HTTP\/1.1\" 200 18216 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:48 +0000] \"GET \/fi\/members\/stella85\/activity\/1086 HTTP\/1.1\" 200 12350 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:48 +0000] \"GET \/ru\/members\/stella85\/activity\/groups\/?acpage=12 HTTP\/1.1\" 200 21159 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:49 +0000] \"GET \/zh\/members\/kris\/activity\/friends\/ HTTP\/1.1\" 200 18838 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:50 +0000] \"GET \/fr\/members\/cheryl\/activity\/groups\/?acpage=14 HTTP\/1.1\" 200 19312 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:51 +0000] \"GET \/vi\/members\/vecttra\/activity\/groups\/?acpage=4 HTTP\/1.1\" 200 17180 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:53 +0000] \"GET \/fi\/members\/?s=Intermediate&upage=1 HTTP\/1.1\" 200 13955 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.109.200 - - [18\/May\/2012:12:57:53 +0000] \"GET \/ar\/members\/test-user\/activity\/groups\/?acpage=9 HTTP\/1.1\" 200 17499 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.151 - - [18\/May\/2012:12:57:53 +0000] \"GET \/members\/admin\/friends HTTP\/1.1\" 200 14760 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.151 - - [18\/May\/2012:12:57:53 +0000] \"GET \/members\/lasso\/activity\/groups\/?acpage=3 HTTP\/1.1\" 200 18101 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n207.46.13.51 - - [18\/May\/2012:12:57:58 +0000] \"GET \/fi\/members\/hoangdtv7986\/friends HTTP\/1.1\" 200 13741 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.110.17 - - [18\/May\/2012:12:58:01 +0000] \"GET \/ru\/members\/nikoletth\/points\/ HTTP\/1.1\" 200 14970 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.151 - - [18\/May\/2012:12:58:01 +0000] \"GET \/ko\/members\/muratoncel3438\/activity\/ HTTP\/1.1\" 200 16100 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.151 - - [18\/May\/2012:12:58:01 +0000] \"GET \/fr\/members\/jack\/activity\/ HTTP\/1.1\" 200 15425 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.151 - - [18\/May\/2012:12:58:03 +0000] \"GET \/pl\/members\/filozofem\/blogs\/ HTTP\/1.1\" 200 13573 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.151 - - [18\/May\/2012:12:58:08 +0000] \"GET \/nl\/members\/mohamedyahia\/activity\/friends\/?acpage=10 HTTP\/1.1\" 200 17835 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.151 - - [18\/May\/2012:12:58:08 +0000] \"GET \/pt\/members\/augert\/activity\/favorites\/ HTTP\/1.1\" 200 15187 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.151 - - [18\/May\/2012:12:58:10 +0000] \"GET \/ko\/members\/chima78\/activity\/favorites\/ HTTP\/1.1\" 200 15851 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.151 - - [18\/May\/2012:12:58:14 +0000] \"GET \/he\/members\/wildthing\/activity\/friends\/ HTTP\/1.1\" 200 16883 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n207.46.13.51 - - [18\/May\/2012:12:58:21 +0000] \"GET \/no\/members\/filiz\/activity\/friends\/?acpage=7 HTTP\/1.1\" 200 16440 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.108.69 - - [18\/May\/2012:12:58:30 +0000] \"GET \/members\/david\/activity\/groups\/?acpage=8 HTTP\/1.1\" 200 17208 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n207.46.13.51 - - [18\/May\/2012:12:58:30 +0000] \"GET \/pt\/members\/moniques\/activity\/groups\/?acpage=7 HTTP\/1.1\" 200 18934 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.110.17 - - [18\/May\/2012:12:58:31 +0000] \"GET \/sv\/groups\/bep-study-buddies\/members\/?mlpage=7 HTTP\/1.1\" 200 14784 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.108.69 - - [18\/May\/2012:12:58:30 +0000] \"GET \/zh-tw\/members\/marcos\/activity\/friends\/?acpage=2 HTTP\/1.1\" 200 18219 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.110.17 - - [18\/May\/2012:12:58:31 +0000] \"GET \/ja\/members\/sluconi\/activity\/groups\/?acpage=3 HTTP\/1.1\" 200 21509 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.147 - - [18\/May\/2012:12:58:32 +0000] \"GET \/cs\/members\/joyull\/activity\/favorites\/ HTTP\/1.1\" 200 14216 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.108.69 - - [18\/May\/2012:12:58:32 +0000] \"GET \/zh-tw\/members\/moniques\/activity\/friends\/?acpage=15 HTTP\/1.1\" 200 18509 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n157.55.17.147 - - [18\/May\/2012:12:58:32 +0000] \"GET \/pt\/members\/luquejee\/activity\/groups\/?acpage=12 HTTP\/1.1\" 200 18612 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n65.52.110.17 - - [18\/May\/2012:12:58:35 +0000] \"GET \/fi\/activity\/?acpage=66 HTTP\/1.1\" 200 16466 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n207.46.13.51 - - [18\/May\/2012:12:58:58 +0000] \"GET \/members\/filozofem\/profile\/ HTTP\/1.1\" 200 13354 \"-\" \"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)\"<br \/>\n^C<br \/>\n<\/code><\/p>\n<p>The ranges in use by bing appear to be 207.46.*, and 65.52.*, 157.55.17.*<br \/>\nA quick check to see who owns those ranges confirms that yes, it is indeed the evil empire.<\/p>\n<p><code><br \/>\nNetRange:       65.52.0.0 - 65.55.255.255<br \/>\nCIDR:           65.52.0.0\/14<br \/>\nOriginAS:<br \/>\nNetName:        MICROSOFT-1BLK<br \/>\nNetHandle:      NET-65-52-0-0-1<br \/>\nParent:         NET-65-0-0-0-0<br \/>\nNetType:        Direct Assignment<br \/>\nRegDate:        2001-02-14<br \/>\nUpdated:        2012-03-20<br \/>\nRef:            http:\/\/whois.arin.net\/rest\/net\/NET-65-52-0-0-1<\/p>\n<p>NetRange:       207.46.0.0 - 207.46.255.255<br \/>\nCIDR:           207.46.0.0\/16<br \/>\nOriginAS:<br \/>\nNetName:        MICROSOFT-GLOBAL-NET<br \/>\nNetHandle:      NET-207-46-0-0-1<br \/>\nParent:         NET-207-0-0-0-0<br \/>\nNetType:        Direct Assignment<br \/>\nRegDate:        1997-03-31<br \/>\nUpdated:        2004-12-09<br \/>\nRef:            http:\/\/whois.arin.net\/rest\/net\/NET-207-46-0-0-1<\/p>\n<p>NetRange:       157.54.0.0 - 157.60.255.255<br \/>\nCIDR:           157.60.0.0\/16, 157.56.0.0\/14, 157.54.0.0\/15<br \/>\nOriginAS:       AS8075<br \/>\nNetName:        MSFT-GFS<br \/>\nNetHandle:      NET-157-54-0-0-1<br \/>\nParent:         NET-157-0-0-0-0<br \/>\nNetType:        Direct Assignment<br \/>\nComment:        Abuse complaints will only be responded to if sent to abuse@microsoft.com and abuse@msn.com.<br \/>\nRegDate:        1994-04-28<br \/>\nUpdated:        2010-08-19<br \/>\nRef:            http:\/\/whois.arin.net\/rest\/net\/NET-157-54-0-0-1<br \/>\n<\/code><\/p>\n<p>As you can see, they do have an abuse email contact.  Which bounces.<br \/>\nNeed I say anything more?<\/p>\n<p>As I could readily identify that they were completely ignoring the file, even *after* downloading it from logs (eg see a request for the robots.txt file, then more requests for folders explicitly denied inside the robots.txt file! from the same ip), I decided to take some action to block them.<\/p>\n<p>The following will block MSN Bot (Bing) from hammering a site.<\/p>\n<p>#Block 207.46.*<br \/>\niptables -A INPUT -s 65.52.0.0\/14  -j DROP<br \/>\n#Block 65.52.*<br \/>\niptables -A INPUT -s 207.46.0.0\/14  -j DROP<br \/>\n#Block 157.55.17.*<br \/>\niptables -A INPUT -s 157.55.17.0\/24 -j DROP<\/p>\n<p>Note that the 3rd range actually goes from 157.54.0.0 &#8211; 157.60.255.255<br \/>\nI wasn&#8217;t actually seeing any evilness from the 157.56 &#8211; 157.60.* range, so I&#8217;ve ignored them.  Letting some Bing stuff through is a good idea (assuming they can behave themselves), as we don&#8217;t want to lose SEO goodness on one of the less^H popular search engines.<\/p>\n<p>A quick tail of the logs later, and I could see that the multitude of bandwidth leeching MSN \/ Bing bots were gone.  Plus, the site loaded much much faster.<\/p>\n<p>A quick google (haha), for MSN \/ BING spiders doing the same thing to others revealed that we aren&#8217;t alone, and a number of people complain about exactly the same issue.<\/p>\n<p>According to Bing, they do respect the protocol.<\/p>\n<p><a href=\"http:\/\/www.bing.com\/community\/site_blogs\/b\/webmaster\/archive\/2009\/08\/10\/crawl-delay-and-the-bing-crawler-msnbot.aspx\">http:\/\/www.bing.com\/community\/site_blogs\/b\/webmaster\/archive\/2009\/08\/10\/crawl-delay-and-the-bing-crawler-msnbot.aspx<\/a><\/p>\n<p>My own findings, and a check of others findings show that they do not.<\/p>\n<p>This search might be of interest &#8211;<br \/>\n<a href=\"http:\/\/www.google.com\/search?&#038;q=bing+ignore+robots.txt\">http:\/\/www.google.com\/search?&#038;q=bing+ignoring+robots.txt<\/a><\/p>\n<p>We&#8217;re not the only ones &#8211;<br \/>\n<a href=\"http:\/\/techie-buzz.com\/microsoft\/bing-crawler-msnbot-stupid.html\">http:\/\/techie-buzz.com\/microsoft\/bing-crawler-msnbot-stupid.html<\/a><\/p>\n<p><a href=\"http:\/\/www.semwisdom.com\/blog\/msnbot-stupid-plain-evil\">http:\/\/www.semwisdom.com\/blog\/msnbot-stupid-plain-evil<\/a><\/p>\n<p>As we&#8217;ve verified that the ip ranges in use by the crawlers are indeed owned by Microsoft, its pretty evident that they&#8217;re lying.<\/p>\n<p>C&#8217;est la vie.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of our colo clients was complaining that his site was slow. Took a look, and although load was only slightly above normal, he was doing a substantial amount of traffic throughput. As he has multiple *busy* sites on his server, it was easier to take a look at iftop to see what was leeching [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[25],"tags":[407,405,404,406],"class_list":["post-845","post","type-post","status-publish","format-standard","hentry","category-technical-mumbo-jumbo","tag-abuse","tag-bing","tag-msn","tag-spider"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts\/845","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/comments?post=845"}],"version-history":[{"count":1,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts\/845\/revisions"}],"predecessor-version":[{"id":846,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts\/845\/revisions\/846"}],"wp:attachment":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/media?parent=845"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/categories?post=845"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/tags?post=845"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}