This week in Bot Oddities
ShopWikiBot still forgets file extensions
The first time ShopWikiBot hit a site of mine I was surprised to see its requests to URLs on our IIS servers for files with no extension. The requests would otherwise be valid 200 responses to real documents, but ShopWikiBot was repeatedly leaving the “.asp” out of the URL.
I asked the ShopWiki people what the heck they were up to, and they provided me with a prompt, personalized and uninteresting response.
Hi Corey,
Thank you for letting us know about our crawler’s behavior. Our crawler tries to find the most efficient path to crawl your site, and occasionally tries invalid paths. It does quickly detect the error and corrects itself, so you should see invalid urls like this only very rarely. We apologize for any inconvenience this has caused and please let us know if this problem persists for an unreasonable length of time.
Regards,
Lauren
The requests have not stopped, and I remain intrigued. I am always guilty of thinking too hard about problems, so I can not resist. What crawling strategy might this be? Is the ShopWikiBot simply drunk? Could a directory matching every file name on a website indicate something more about its structure?
404s at fdfdkll.html
Do you know where to find fdfdkll.html? Googlebot thought he knew where to look, but he was wrong! Perhaps a Googlebot imposter is testing his 404 crawling technique.
What is fdfdkll.html?
If you are reading this, perhaps you also have pondered this question and searched for any mention of fdfdkll.html on the web. I am writing only because I have exhausted that search. Your guess is as good as mine.
Here’s the user agent and IP of the alleged Googlebot that requested fdfdkll.html from two of my sites this week:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
66.249.71.206
Comments(2)
