<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Blacklisting via Ionic&#8217;s Isapi Rewrite Filter</title>
	<atom:link href="http://www.tacticaltechnique.com/bots/blacklisting-via-iirf/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.tacticaltechnique.com/bots/blacklisting-via-iirf/</link>
	<description>Web Development Observations and Asides by Corey Salzano</description>
	<lastBuildDate>Sun, 07 Mar 2010 18:58:34 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Cheeso</title>
		<link>http://www.tacticaltechnique.com/bots/blacklisting-via-iirf/comment-page-1/#comment-5466</link>
		<dc:creator>Cheeso</dc:creator>
		<pubDate>Tue, 30 Jun 2009 01:19:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.tacticaltechnique.com/robots/blacklisting-via-ionics-isapi-rewrite-filter/#comment-5466</guid>
		<description>@daniel, yes it&#039;s possible to do what you want.  You need a RedirectRule, to redirect from /default/page.htm (in case anyone types it in), and then you need a RewriteRule to rewrite mysite.com to /default/page.htm or whatever it was you wanted to be your default page. 

This is all in the readme.</description>
		<content:encoded><![CDATA[<p>@daniel, yes it&#8217;s possible to do what you want.  You need a RedirectRule, to redirect from /default/page.htm (in case anyone types it in), and then you need a RewriteRule to rewrite mysite.com to /default/page.htm or whatever it was you wanted to be your default page. </p>
<p>This is all in the readme.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Corey</title>
		<link>http://www.tacticaltechnique.com/bots/blacklisting-via-iirf/comment-page-1/#comment-4262</link>
		<dc:creator>Corey</dc:creator>
		<pubDate>Wed, 13 May 2009 13:53:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.tacticaltechnique.com/robots/blacklisting-via-ionics-isapi-rewrite-filter/#comment-4262</guid>
		<description>Jim,
&lt;p&gt;&#160;&lt;/p&gt;
These bots were cluttering up my error logging database table for 404 and 500 errors on a server holding about 250 websites. 
&lt;p&gt;&#160;&lt;/p&gt;
Like you, I took small steps initially in case any of the bots turned out to be legit. I have found no trace of any negative consequences.
&lt;p&gt;&#160;&lt;/p&gt;
I am due to write a new post about all the different useless user-agents that I am blocking without using an IP address match as well. The amount of crap out there is astounding.</description>
		<content:encoded><![CDATA[<p>Jim,</p>
<p>&nbsp;</p>
<p>These bots were cluttering up my error logging database table for 404 and 500 errors on a server holding about 250 websites. </p>
<p>&nbsp;</p>
<p>Like you, I took small steps initially in case any of the bots turned out to be legit. I have found no trace of any negative consequences.</p>
<p>&nbsp;</p>
<p>I am due to write a new post about all the different useless user-agents that I am blocking without using an IP address match as well. The amount of crap out there is astounding.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jim</title>
		<link>http://www.tacticaltechnique.com/bots/blacklisting-via-iirf/comment-page-1/#comment-4257</link>
		<dc:creator>jim</dc:creator>
		<pubDate>Wed, 13 May 2009 08:53:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.tacticaltechnique.com/robots/blacklisting-via-ionics-isapi-rewrite-filter/#comment-4257</guid>
		<description>Corey, thanks for the response.

I found a site &quot;Project Honey Pot&quot; that tracks these things and confirms them by seeing if they spam or not, it might be of use (it&#039;s possible I found the link on this blog somewhere).
http://www.projecthoneypot.org/harvester_useragents.php

At this point I&#039;m not getting hit that hard by these &quot;bots&quot;, but if it becomes abusive, to the point it starts taking a lot of CPU then I will have to do something about it and appreciate the info.

Bandwidth for the few K that they take isn&#039;t a problem (yet). What I am way more worried about is losing some possible indexing that could send traffic to my site.

All they seem to do so far is load the first level pages and go no deeper, they load no images.

Yes, they should properly identify themselves, but maybe they don&#039;t want to because people might use that ID to &quot;game&quot; their system.

If I ever do start to block, I&#039;m probably going to start with the IP list from that &quot;honey pot&quot; place since those are confirmed and see how that goes.

Plus, if these guys who do this get a clue, they can just change the agent string randomly to any number of known browser types. After that the only way you could tell is by their behavior, which may not be the best indication, or by the honey pot method.</description>
		<content:encoded><![CDATA[<p>Corey, thanks for the response.</p>
<p>I found a site &#8220;Project Honey Pot&#8221; that tracks these things and confirms them by seeing if they spam or not, it might be of use (it&#8217;s possible I found the link on this blog somewhere).<br />
<a href="http://www.projecthoneypot.org/harvester_useragents.php" rel="nofollow">http://www.projecthoneypot.org/harvester_useragents.php</a></p>
<p>At this point I&#8217;m not getting hit that hard by these &#8220;bots&#8221;, but if it becomes abusive, to the point it starts taking a lot of CPU then I will have to do something about it and appreciate the info.</p>
<p>Bandwidth for the few K that they take isn&#8217;t a problem (yet). What I am way more worried about is losing some possible indexing that could send traffic to my site.</p>
<p>All they seem to do so far is load the first level pages and go no deeper, they load no images.</p>
<p>Yes, they should properly identify themselves, but maybe they don&#8217;t want to because people might use that ID to &#8220;game&#8221; their system.</p>
<p>If I ever do start to block, I&#8217;m probably going to start with the IP list from that &#8220;honey pot&#8221; place since those are confirmed and see how that goes.</p>
<p>Plus, if these guys who do this get a clue, they can just change the agent string randomly to any number of known browser types. After that the only way you could tell is by their behavior, which may not be the best indication, or by the honey pot method.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Corey</title>
		<link>http://www.tacticaltechnique.com/bots/blacklisting-via-iirf/comment-page-1/#comment-4228</link>
		<dc:creator>Corey</dc:creator>
		<pubDate>Tue, 12 May 2009 13:01:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.tacticaltechnique.com/robots/blacklisting-via-ionics-isapi-rewrite-filter/#comment-4228</guid>
		<description>Jim, it is impossible to predict why Java bots are hitting your sites. Many java developers who do not specify a custom user-agent while designing their crawlers are using these user-agents when requesting documents from your server.
&lt;p&gt;&#160;&lt;/p&gt;
I think we agree that the extra load on our web servers caused by java bots is useless, and blocking these robots is so easy that it simply makes sense. 
&lt;p&gt;&#160;&lt;/p&gt;
I am about to update the post with some simplified rewrite rules that I have been using for a couple weeks.</description>
		<content:encoded><![CDATA[<p>Jim, it is impossible to predict why Java bots are hitting your sites. Many java developers who do not specify a custom user-agent while designing their crawlers are using these user-agents when requesting documents from your server.</p>
<p>&nbsp;</p>
<p>I think we agree that the extra load on our web servers caused by java bots is useless, and blocking these robots is so easy that it simply makes sense. </p>
<p>&nbsp;</p>
<p>I am about to update the post with some simplified rewrite rules that I have been using for a couple weeks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jim</title>
		<link>http://www.tacticaltechnique.com/bots/blacklisting-via-iirf/comment-page-1/#comment-4221</link>
		<dc:creator>jim</dc:creator>
		<pubDate>Tue, 12 May 2009 05:55:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.tacticaltechnique.com/robots/blacklisting-via-ionics-isapi-rewrite-filter/#comment-4221</guid>
		<description>I don&#039;t have any e-mails for them to find on the site, but they keep hitting. Is it really a problem? I&#039;m more worried about the CPU load a lot of extra checking might cause.
And is that all these Java bots are doing? Maybe they are part of some sort of blog site that lists stuff, or other search/sort type of programs.
At first I thought maybe they were some sort of cache system for ISPs so they could keep a local copy and save bandwidth on their network.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t have any e-mails for them to find on the site, but they keep hitting. Is it really a problem? I&#8217;m more worried about the CPU load a lot of extra checking might cause.<br />
And is that all these Java bots are doing? Maybe they are part of some sort of blog site that lists stuff, or other search/sort type of programs.<br />
At first I thought maybe they were some sort of cache system for ISPs so they could keep a local copy and save bandwidth on their network.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Corey</title>
		<link>http://www.tacticaltechnique.com/bots/blacklisting-via-iirf/comment-page-1/#comment-2496</link>
		<dc:creator>Corey</dc:creator>
		<pubDate>Sat, 14 Feb 2009 16:33:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.tacticaltechnique.com/robots/blacklisting-via-ionics-isapi-rewrite-filter/#comment-2496</guid>
		<description>Daniel, I have not had time to examine MCMS. You may want to post at the IIRF home page on Microsoft&#039;s Codeplex to see if anyone is running the two together. http://www.codeplex.com/IIRF</description>
		<content:encoded><![CDATA[<p>Daniel, I have not had time to examine MCMS. You may want to post at the IIRF home page on Microsoft&#8217;s Codeplex to see if anyone is running the two together. <a href="http://www.codeplex.com/IIRF" rel="nofollow">http://www.codeplex.com/IIRF</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: daniel</title>
		<link>http://www.tacticaltechnique.com/bots/blacklisting-via-iirf/comment-page-1/#comment-2494</link>
		<dc:creator>daniel</dc:creator>
		<pubDate>Sat, 14 Feb 2009 07:43:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.tacticaltechnique.com/robots/blacklisting-via-ionics-isapi-rewrite-filter/#comment-2494</guid>
		<description>Any ideas? It would be great if this is possible, if not, I will have to find some other method to accomplish this. Thanks!</description>
		<content:encoded><![CDATA[<p>Any ideas? It would be great if this is possible, if not, I will have to find some other method to accomplish this. Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: daniel</title>
		<link>http://www.tacticaltechnique.com/bots/blacklisting-via-iirf/comment-page-1/#comment-2484</link>
		<dc:creator>daniel</dc:creator>
		<pubDate>Fri, 13 Feb 2009 03:48:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.tacticaltechnique.com/robots/blacklisting-via-ionics-isapi-rewrite-filter/#comment-2484</guid>
		<description>Hi Corey, 

Thanks again for your reply. Yes that&#039;s what i want to accomplish, to show users only my domain. However, the home/default.html path is not a &#039;physical&#039; file in my server, it is based on MCMS. I think your approach will only work if there is an existing physical file of home/default.html. I saw the generated logs and it is clearly looking for the physical file. Is my goal still possible with my current setup? Kindly advice. Thanks.</description>
		<content:encoded><![CDATA[<p>Hi Corey, </p>
<p>Thanks again for your reply. Yes that&#8217;s what i want to accomplish, to show users only my domain. However, the home/default.html path is not a &#8216;physical&#8217; file in my server, it is based on MCMS. I think your approach will only work if there is an existing physical file of home/default.html. I saw the generated logs and it is clearly looking for the physical file. Is my goal still possible with my current setup? Kindly advice. Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Corey</title>
		<link>http://www.tacticaltechnique.com/bots/blacklisting-via-iirf/comment-page-1/#comment-2483</link>
		<dc:creator>Corey</dc:creator>
		<pubDate>Fri, 13 Feb 2009 03:20:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.tacticaltechnique.com/robots/blacklisting-via-ionics-isapi-rewrite-filter/#comment-2483</guid>
		<description>I tested the rules on one of my domains before publishing them for you. If you are getting an error you may have a configuration problem. 
&lt;p&gt;&#160;&lt;/p&gt;
RewriteCond %{SERVER_NAME} ([^\.] )\.mysite\.com$ [I]
&lt;p&gt;&#160;&lt;/p&gt;
This first line means check all requests for this domain. My server has a few hundred sites that are all configured to use the IIRF ISAPI filter (right click website profile properties isapi filters tab). This condition identifies the site I want to modify from the bunch. 
&lt;p&gt;&#160;&lt;/p&gt;
RewriteRule ^/?$ /home/default.htm
&lt;p&gt;&#160;&lt;/p&gt;
This rule says match requests on the root for empty string or a slash...
&lt;p&gt;&#160;&lt;/p&gt;
^ beginning of string
/? optional one slash
$ end of string
&lt;p&gt;&#160;&lt;/p&gt;
..and rewrite that location to /home/default.htm. The result is that any time the root level of the domain is requested, mysite.com or mysite.com/, return the contents of /home/default.htm for that request. The user sees just mysite.com/ in their browser&#039;s address bar, but they are looking at /home/default.htm. 
&lt;p&gt;&#160;&lt;/p&gt;
I believe this is what you are trying to accomplish based on your comments.</description>
		<content:encoded><![CDATA[<p>I tested the rules on one of my domains before publishing them for you. If you are getting an error you may have a configuration problem. </p>
<p>&nbsp;</p>
<p>RewriteCond %{SERVER_NAME} ([^\.] )\.mysite\.com$ [I]</p>
<p>&nbsp;</p>
<p>This first line means check all requests for this domain. My server has a few hundred sites that are all configured to use the IIRF ISAPI filter (right click website profile properties isapi filters tab). This condition identifies the site I want to modify from the bunch. </p>
<p>&nbsp;</p>
<p>RewriteRule ^/?$ /home/default.htm</p>
<p>&nbsp;</p>
<p>This rule says match requests on the root for empty string or a slash&#8230;</p>
<p>&nbsp;</p>
<p>^ beginning of string<br />
/? optional one slash<br />
$ end of string</p>
<p>&nbsp;</p>
<p>..and rewrite that location to /home/default.htm. The result is that any time the root level of the domain is requested, mysite.com or mysite.com/, return the contents of /home/default.htm for that request. The user sees just mysite.com/ in their browser&#8217;s address bar, but they are looking at /home/default.htm. </p>
<p>&nbsp;</p>
<p>I believe this is what you are trying to accomplish based on your comments.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: daniel</title>
		<link>http://www.tacticaltechnique.com/bots/blacklisting-via-iirf/comment-page-1/#comment-2480</link>
		<dc:creator>daniel</dc:creator>
		<pubDate>Fri, 13 Feb 2009 01:55:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.tacticaltechnique.com/robots/blacklisting-via-ionics-isapi-rewrite-filter/#comment-2480</guid>
		<description>Hi Corey,

Thanks for your reply, I tried your approach but still resulted to error page. Actually I don&#039;t want to redirect the page, I just want to mask it, so if a user browsed to my domain (http://www.mysite.com) he/she will not be redirected to (http://www.mysite.com/home/default.html)

My goal is really not to show the /home/default.html path (more like to strip it in the address/URL bar). Is this possible in IIRF? Thanks again.</description>
		<content:encoded><![CDATA[<p>Hi Corey,</p>
<p>Thanks for your reply, I tried your approach but still resulted to error page. Actually I don&#8217;t want to redirect the page, I just want to mask it, so if a user browsed to my domain (<a href="http://www.mysite.com" rel="nofollow">http://www.mysite.com</a>) he/she will not be redirected to (<a href="http://www.mysite.com/home/default.html" rel="nofollow">http://www.mysite.com/home/default.html</a>)</p>
<p>My goal is really not to show the /home/default.html path (more like to strip it in the address/URL bar). Is this possible in IIRF? Thanks again.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
