Back on the blog train

I have not written any new posts here in quite some time, but I have not been sitting idle.

More than a year ago, I wrote about some ebay scams that were appearing on craigslist and autotrader. Since then, a few posts on this website have been receiving one to three comments a day. The work moderating these comments grew to be such a burden, that I created a whole scam tracking website to try and handle the traffic. I am still trying to work out the details. The effort I put towards these incoming stories and comments satisfies my desire to contribute to this website, so the result seems to be less new posts.

I am committed to many other old posts, here. I recently updated my hide email code post. This source code helps you publish emails on web pages while shielding the address from most email harvesting robots. The goal is to reduce spam.

Another way to reduce spam that occurs as a result of running websites is blocking the bad robots altogether. I have written a few popular posts on that subject that have inspired some great conversation, and I respond to incoming comments on those posts regularly.

Do not use Yahoo mail for important messages

The world’s largest free email provider is simply unreliable. Sure, most of the user experience is pleasant, but Yahoo bans entire network blocks when it finds a few instances of abuse. Senders receive error messages like this:

421 Message temporarily deferred

Temporary error from SMTP partner: smtp;421 Message temporarily deferred – 4.16.51. Please refer to http://help.yahoo.com/help/us/mail/defer/defer-06.html.

  1. the message you attempted to send exhibited characteristics indicative of spam,and/or
  2. emails from your network have been generating complaints from Yahoo! Mail users.

The problem with blocking a whole network is clear–blanket banishment harms innocent users. Do not use Yahoo mail for important messages because innocent people out there are being blocked by Yahoo. A few vandals can cause Yahoo to temporarily ban an entire ISP user base.

Disguise Email Addresses for online publishing

I just wrote this quick ASP code to disguise email addresses on web pages. There is a better way to obfuscate email addresses. I recommend creating an image instead of using text in every case, but sometimes a plain text email address link is effective.

Some losers send spam email for a living, and will send garbage to any email they can find online. Obfuscating email addresses in character codes cloaks them from some of the leeches. There are plenty of websites that will perform the conversion for you, but this morning I decided to create code to automate this task.

Here is an ASP classic function that will convert a string to ASCII characters. PHP code below. These characters will display as normal text to the casual user. The difference between alphabet characters and ASCII characters is that encoded characters must be evaluated before they look like an email address. This thin veil of secrecy is enough to fight off some email harvesting robodicks.


public function asciiDisguise( string )
	build = ""
	for i=1 to len( string )
		build = build & "&#" & asc( mid( string, i, 1 )) & ";"
	next
	asciiDisguise = build
end function

 

I will write a PHP version of this code some time soon and update this post.

UPDATE:

Here is the same function in PHP.


function asciiDisguise( $str ){
	$build = "";
	for( $i=0;$i<strlen( $str );$i++ ){
		$build .= "&#" . ord( substr( $str, $i, 1 )) . ";";
	}
	return $build;
}

Robot Genius ironically a dumb robot

I love the irony in a web crawler called RobotGenius making a dumb mistake. I have never heard of this company nor their bot until this morning when I found this comical 404 error in my logs.
404;http://www.myclientsite.com/javascript:jumpDetails()
robotgenius (http://robotgenius.net)
208.96.10.201

Robot Genius requests javascript links as relative paths. How genius!

Hide posts from WP home page

Hiding posts from showing up on the home page is something I have been thinking about all week. Last night I finally dug into the WP Codex and figured out how to make it happen. Hide WordPress posts from appearing on your home page by adding their post IDs to the following array (where you see 603 and 621), and place this code before the loop starts in your theme’s main index file.


<?php
if( is_home()){ query_posts(array('post__not_in' => array(603,621))); }
?>

This code prevents posts from showing up on your home page, but allows them to be directly accessed and shown in views other than the home page like category archive. The posts will also be included in your RSS feed.

F.A.Q.

  • Where do I put this code?

    Place the code before the loop starts in your theme’s index. To edit the index file, browse to Design > Theme Editor > Main Index Template. Place the code anywhere before this line: <?php if (have_posts()) ?>

  • How do I find the IDs of my posts?

    Go to Manage > Posts and mouseover the titles of your posts. Look at the URL showing up in your browser’s status bar. The post ID will be at the end of the edit post link URL you see.

Most ‘hidden post’ tutorials explain to use a hidden category. The problem with the category approach is the need to hide the category from showing up in the sidebar and any other place it may appear. WP does not have built in hidden or secret categories. The idea of chasing out a category everywhere categories are shown on my WP is not an attractive idea.

This code lets me publish posts that are not intended for prime time viewing but work great as a post rather than a page.

Display WP Post Category without link

Here is a small piece of code that will display the category name of a WordPress post without a hyperlink to the category page. Typically, the category data is retrieved with the_category( ). This function is not useful for manipulating the category name in plain text. Displaying the category in plain text is easy with get_the_category( ), however.


<?php
$category = get_the_category();
echo $category[0]->cat_name;
?>

GoDaddy email authentication errors

GoDaddyGoDaddy offers a decent email account for a few dollars a month. These mail servers are picky about what ports are used to send outgoing messages based on where the message originates. I think I have cracked the case. If you are encountering either of the following errors, you are welcome.

When sending outgoing messages from an email client, you may get a server response:
553 Sorry, that domain isn't in my list of allowed rcpthosts

When creating outgoing messages in a server-side script, you may intermittently encounter:
error: -2147220973 The transport failed to connect to the server.

The problem in both cases is the port number. GoDaddy does not always favor port 25 standards. Try port 80 in your www script, and try port 3535 in your email client.

How to: Stop Spam on your MediaWiki website

This past week I deleted a few hundred wiki pages and user accounts from the MediaWiki installation our company uses to track software features and technical issues. Here is how you can stop your public MediaWiki website from becoming the victim of relentless spam bots.

  1. Limit exposure

    My wiki was operating smoothly for about a year and a half before any spammer had found it. A publicly accessible and poorly secured website is always a sitting duck. Once the site was indexed by search engines, finding it became a lot easier. A simple search query like Powered by MediaWiki will list thousands of targets for wiki spammers. I am not sure how our wiki was found by search engine robots, but I certainly know when. Right before the onslaught of spam.

    Limit exposure to software security holes

    The very first change I made to my wiki was a complete update of the MediaWiki software running on the website. The longer software source code is available in the wild, the more likely that someone has found a security hole or method to exploit the scripts to allow outside manipulation.

    The backup and upgrade procedures are very intimidating to the casual user. I downloaded a copy of all the website’s files, uploaded the files that make up the latest release, uploaded my old LocalSettings.php so it was not changed with the update, and then ran the installation script again. When you re-run the MediaWiki installer, it will recognize the existing database tables and update them accordingly.

    Prevent Search Engines from indexing your MediaWiki

    I am now using the REP to prevent robots from crawling the entire site. The robots.txt file in the root of the website directory looks like this:

    User-agent: *
    Disallow: /

    Our wiki is for our use within the office only, so we could care less if anyone else finds or reads the website’s contents via a search engine. If removing your wiki from search engines is not a viable course of action, you can still stop spammers by following the rest of these instructions.

  2. Trip the bots

    The most violent spammers that attack MediaWiki websites are automated scripts. These scripts assume that the MediaWiki is unmodified and vulnerable to its content creation routines. A simple CAPTCHA will trip the spam bots. Spammers don’t have time to figure out why they can’t pollute a certain MediaWiki website–they move on to easier targets. I installed the ConfirmEdit extension and configured it to require a simple arithmetic CAPTCHA before saving any edit.

    Restrict user account creation and anonymous editing

    Here are two lines of code I added to LocalSettings.php to prevent new user registrations and anonymous (IP address only) edits:

    # Prevent new user registrations except by sysops
    $wgWhitelistAccount = array ( "user" => 0, "sysop" => 1, "developer" => 1 );
    
    # Restrict anonymous editing
    $wgGroupPermissions['*']['edit'] = false;
    
  3. Learn how to police new content

    Within 30 days of the initial attack, my wiki had hundreds of new pages and user accounts. More garbage was being added to the wiki so quickly, that the Recent Changes page was not a sufficient monitor for me to see what was being added to my website. Here is a valuable page that outputs a list of every page on your wiki:

    http://www.yourmediawiki.com/index.php?title=Special:AllPages

    I also installed an extension called Nuke that facilitates quick mass deletion of any user’s contributions.

    Larger or highly active wikis will naturally be harder to maintain as spam-free websites. I am very happy that I got to experience these spam bots only 18 months after launching the wiki. Using the AllPages script was only slightly painful because the the total number of good pages on my wiki at the time was in the low hundreds. If the spam bots find another way to plague my website, I will surely write a second chapter to this guide.

User-agent wwloadgenerator

This just does not sound good. I have no idea what wwloadgenerator is, but it does not sound like a friend of mine. Has this user-agent visited you this month?

UPDATE 2011-02-24

I just tried to look up what this user-agent is today, and the only result in search engines is this. My own web page. This agent is coming from Germany: 80.66.20.180.

New faster browser thanks to Google: Iron

SRWare IronSRWare Iron is a new lightweight web browser based on the open source project Chromium by Google, not Google Chrome. Iron is like Chrome, but it has all the Google specific features removed. The result is a browser that uses almost 25% less memory than Chrome. Iron has the same slick interface, the same intuitive design, and the same great features with the exception of any communication with Google.

Here is a mash up of three screen shots I took of my Task Manager to verify the memory usage, but this Iron vs Chrome Google Doc I created to outline the memory saving benefits of Iron compared to Chrome is a lot easier to read.

When Google announced the upcoming release of their Chrome web browser, I became immediately guilty of checking hour after hour to see if the Google Chrome website had been launched. The hype was tremendous, and the browser is wonderful.

Google Chrome is wonderful

The makers of Iron agree. SRWare’s website is available in English and describes their interest in Google’s browser:

Google’s Web browser Chrome thrilled with an extremely fast site rendering, a sleek design and innovative features. But it also gets critic from data protection specialists, for reasons such as creating a unique user ID or the submission of entries to Google to generate suggestions. SRWare Iron is a real alternative. The browser is based on the Chromium-source and offers the same features as Chrome – but without the critical points that the privacy concern.
http://www.srware.net/en/software_srware_iron.php

Chromium based projects are better

Because they do not have these specific Google Chrome features:

  1. Suggestion service as you type
    When you type into Google Chrome’s “omnibox,” a drop down list suggests keywords and phrases that you may be searching for. This requires whatever you are typing to be sent to Google. No thanks. I know what I am typing. I do not need to connect to Google, upload what I am typing and download a list of suggestions to help me finish typing. Google may not save every search I am performing to possibly suggest it later to other users.
  2. Page not found alternatives
    If you encounter a 404 error, Google Chrome will sometimes connect to Google and upload the address of the page you are trying to access. Google may suggest an alternative file to browse instead of the page you cannot find. No thanks. I would like my web browser to never upload the location of the web pages I am browsing, unless that communication is via my ISP to the web server where the page resides during a normal HTTP request. Third parties have no business recording the URLs of pages I am browsing.
  3. Bad sites lists downloaded every 30 minutes
    Google Chrome connects to Google every 30 minutes to download a list of bad sites that may contain malware. No thanks. I like to download files and updates at my own pace. I do not need Google’s list of world wide web baddies to babysit my web browsing. If I am stupid enough to download adware or a virus, then I will learn a lesson while fixing my computer.

When I first started learning about search engines, I was obsessed with the Google Toolbar and its green PageRank meter display. The fascination is a rite of passage, I think, of becoming interested with how search engines work.

I gave up the toolbar after realizing that in exchange for the instant gratification of a web page’s rating according to Google, I was wearing a wire!

Certain optional Toolbar features operate by sending Google the addresses or other information about sites when you visit them. Web History, PageRank, and Safe Browsing in Enhanced Mode all work this way.
http://www.google.com/support/toolbar/bin/static.py?page=privacy.html

No thanks! I am not interested in allowing Google to track every URL I visit and associate it with my IP address and a “unique application number” in their server logs. I traded Chrome for Iron for the same reason. SRWare has no interest in recording the usage of the browser they made based on the Chromium source. Google, however, is very much in the business of storing usage data and using it to improve their products.

Google’s Corporate Information and Software Principles page says

We’re alarmed by what we believe is a growing disregard for your rights as computer users.

I could not agree more, and this growing disregard is exactly why I do not want any company to save every URL I visit and every character I type into a search box. This growing disregard is exactly why I stopped using the Google Toolbar. This growing disregard is exactly why I recommend SRWare Iron over Google Chrome.

After all, if Google has some reason to keep their web browsing information secret, I have a list of reasons to keep my own information a secret from Google.

Download Iron for free at SRWare.net

« Previous PageNext Page »

 

Thanks for reading!

Sign up for email updates: