How to Block Java user-agents

A variety of user-agents that begin with “Java” are likely visiting your website. Visits providing this type of user-agent are programs created in Java by developers who did not choose to change the default user-agent string value. Here is a list of the Java user-agents I have encountered:


Java/1.4.1_04
Java/1.5.0_02
Java/1.5.0_06
Java/1.5.0_14
Java/1.6.0_02
Java/1.6.0_03
Java/1.6.0_04
Java/1.6.0_07
Java/1.6.0_11
Java/1.6.0_12
Java/1.6.0-oem

I will maintain this list simply for kicks. There is no need to collect an exhaustive list of these user-agent strings in order to block them. As I have mentioned before, I prefer to ban non-human visitors based on a combination of an IP address and a user-agent string.

URL rewrite rules

Here are some URL rewriting conditions and rules that will match a list of IP addresses and any user-agent that begins with “Java” and deliver a 403 Forbidden response for any HTTP request to your server:


RewriteCond %{HTTP_USER_AGENT} Java.*
RewriteRule ^/(.*)$ /$1 [F]

The condition matches any user-agent string that begins with “Java” no matter what comes later. The rewrite rule returns any location that was requested with a 403 Forbidden response code. There will be no change made to the URL and no document delivered.

IIS7 URL Rewrite web.config


<rule name="no-java-bots" stopProcessing="true">
    <match url="(.*)" />
    <conditions>
	<add input="{HTTP_USER_AGENT}" pattern="^Java/.*" />
    </conditions>
    <action type="AbortRequest" />
</rule>

Why block Java bots?

Bots with a well-defined purpose will typically identify themselves with a unique name. These Java user-agents are either not interested in identifying their purpose or not ready to publish their name and take ownership of the crawling activities. Both cases are a waste of bandwidth. Test your new application on someone else’s website. Play with your shady crawler on someone else’s website. Come back when you are willing to identify yourself.

7 Comments so far

  1. Yuri on February 24th, 2009

    Thanks for Java-bot explanations. Recently I too have found such and other robots on a site.
    Many robots steal contents of pages of a site, so the decision to block them is correct.

    But I do it a little in another way, because
    the Rewrite Enginerules rules is not convenient for blocking of ranges of IP-addresses, therefore it does a script on PHP, likely:
    $block= array(
    “84.120.0.0-84.123.255.255″,
    “122.198.0.0-122.198.255.255″,
    “205.209.128.0-205.209.191.255″
    );
    function checkIP($ip) {
    for ($i=0; $i= $b_IP && $IP <= $e_IP) return true;
    }
    return false;
    }

    “Manually” blocking IP and UserAgent is not the best practice, so I use robots detection by pseudo-picture loading and JavaScrips evaluating. But Java-bots loaded all pseudo-pictures and evaluate JavaScrips!
    One way to detect Java-bots – by UserAgent’s field, but it is not so difficult Ñ‚o change this fieled.
    What to do in this case?

  2. Corey on February 26th, 2009

    Yuri, rewrite rules can be implemented to block IP address ranges:

    RewriteCond %{REMOTE_ADDR} 213\.93\.196\.\d\d?\d?
    RewriteCond %{HTTP_USER_AGENT} Java.*
    RewriteRule ^/(.*)$ /$1 [F]

    \d represents a single digit in regular expressions, and a question mark ? makes that character optional

  3. [...] access my web site? I’ve also decided to block access to my web site by Java user agents. See How To Block Java User-Agents for someone else’s similar approach to the Java [...]

  4. Fernando Cassia on March 31st, 2011

    Why single-out Java bots and not Silverlight, Flash, and unknown browsers as well?.

    You will realize how stupid your paranoia is when people change the user agent to “MSIE 9, Win7 x64″, and are able to continue crawling your site.

    If you place a web site on the open internet, it´s to be accesed by any user agent, not just your preference of browsers.

    I say F´You to people like you and your ilk, who don´t have a clue about what the open internet is all about.

    FC

  5. Corey on March 31st, 2011

    Fernando:

    Because Java bots clog up my error logs and Java bots are used in SQL injection attacks. When other user agents abuse my websites, I will block them, too.

    This isn’t stupid or paranoid. It’s been successful for years; look at the date on this post.

    Sure, idiots can change their user-agents, and I can use other criteria to block their malicious intent. There’s no way to escape a server log, and analysis of logs is what helps me create solutions like this.

    You are wrong about not being able to choose who accesses my website. I can block whatever I want using mechanisms that are built into any modern web server.

    You would be surprised to learn that lots of servers use whitelists to filter traffic, a step further than the blacklists I maintain.

    Why should I tolerate attacks when the tools to block the least sophisticated are so easy to use?

  6. Jon on May 3rd, 2011

    You both have valid points and a bit of name calling. I’m curious to who has the stronger case.

  7. Jdilegge on May 31st, 2011

    There is always some annoying kid that has to start name calling and acting like he is all knowing.

    Love the post, it is accurate and useful.

Leave a reply

 

Thanks for reading!

Sign up for email updates: