Spam Log is a WordPress plugin that writes a log entry for every comment marked as spam. The log file is suitable for processing by fail2ban.
Recently, I’ve encountered some very aggressive WordPress spam bots. These bots post a new spam comment almost every minute for hours on end. Needless to say my spam queue is a mess. I wrote the following plugin to solve this problem.
What is Spam Log?
Spam Log is a simple WordPress plugin that logs a message every time a comment is marked as spam. Each log message includes the IP address of the poster and the comment’s ID. The log can easily be processed by fail2ban. fail2ban is a daemon that scans log files for misbehaving clients and bans them by IP address. Here is sample output generated by Spam Log:
2009-04-20 04:15:03 comment id=527 from host=22.214.171.124 marked as spam
2009-04-20 04:18:15 comment id=528 from host=126.96.36.199 marked as spam
2009-04-20 04:20:36 comment id=529 from host=188.8.131.52 marked as spam
2009-04-20 04:21:46 comment id=530 from host=184.108.40.206 marked as spam
2009-04-20 04:22:49 comment id=531 from host=220.127.116.11 marked as spam
Why use Spam Log and fail2ban if Akismet/wp-recaptcha/etc. is already catching all the spam?
Many spammers post 50+ comments a day from a single IP address. Even if every comment is correctly marked as spam, the volume alone means that you can’t easily monitor the spam queue for false positives. Spam Log and fail2ban should considerably reduce the total amount of spam.
Even if spam comments never appear on your blog, they still waste valuable resources on your server. Low-memory virtual servers need all available resources for serving legitimate users. Banning spammers at the firewall before they ever connect to your web server is very efficient.
Upload the spam-log folder to the wp-content/plugins directory.
Active the plugin through the WordPress Admin menu.
Set the location of the spam log through Spam Log’s Options page in the WordPress Admin menu. By default, the location is set to wp-content/spam.log. The file or containing directory needs to be writeable by the user that the web server runs as. On Debian or Ubuntu systems, you can do the following:
Change logpath to the path you set on Spam Log’s Options page. This configuration will ban an IP address for a day if it’s used to post 5 comments within an hour that are marked as spam. Warning: Some captcha plugins mark comments as spam when a user fails a captcha. Be careful decreasing maxretry if you’re using such a plugin as there’s a risk that you will ban legitimate users.
In this post, I’ll cover rules 5-13 and summarize the results of the optimizations that I made to my site.
YSlow’s 13 Rules to Improve Web Site Performance (rules 5-13)
A number of these rules are fairly simple, so I’ll cover some together.
5. Put CSS at the top
6. Put JS at the bottom
7. Avoid CSS expressions
I’ll be honest here. I didn’t even know CSS had expressions until I saw this rule. YSlow recommends avoiding them because they are evaluated an absurd number of times.
8. Make JS and CSS external
External files can be cached indepently. If your users usually browse to more than one page on your site or you have frequent returning visitors, this should improve load times.
9. Reduce DNS lookups
DNS lookups can still take a fair amount of time even on a broadband connection. Basically, try to keep the number of unique host names fairly low. There is one competing performance benefit to serving page components from multiple hosts. It turns out that most browsers will only make two simultaneous connections to a given host. So, serving content from multiple hosts can speed up page load times especially if there are a lot of components.
10. Minify JS and CSS
The YUI Compressor is pretty easy to use, but using it is going to get annoying unless you automate. To solve this problem, I once again modified my publish CSS script to do the minifying automatically. If you read the first part of this article and you’re wondering how many times I’m going to modify my publish script, this is the final version:
11. Avoid redirects
12. Remove duplicate scripts
These two rules are fairly easy to follow. Sometimes you can’t avoid redirects such as when you move to a new domain or restructure your site. Those sort of redirects are good and they maintain the reputation your site has built with search engines. That said, a lot of redirects can be eliminated simply by adding a trailing slash to a URL. Be especially mindful if you’re working on hand-coded sites rather than CMS-based sites. Remove duplicate scripts is pretty obvious. Unfortunately, Google Adsense serves the same scripts for each ad on a page and there’s no way to fix it.
13. Configure ETags
I won’t say much about ETags. This YSlow rule actually means configure ETags or remove them entirely. For a site being served off a single web server, ETags offer no benefits and some drawbacks including the fact that Apache generates invalid ETags for gzipped content. I recommend turning them off unless you’re willing to configure them per Yahoo’s ETags best practices. To disable ETags in Apache add the following to the virtual host configuration or the main .htaccess:
Finally, here are the results that I obtained. First, my YSlow score:
I’m somewhat disappointed that after all of that, my score barely improved, moving from an F (59) to a D (63). Unfortunately, most of the remaining areas of optimization are either unrealistic as in Use a CDN or are out of my control.
What about real performance? The following tables include the minimum, maximum, and average times out of 10 page loads. First, with a cold cache:
Page Load Times Before and After Optimization (cold cache)
I’m actually fairly satisfied. Most of the optimizations I did would benefit repeat visitors rather than new visitors. A 15% performance improvement is not bad at all. Next, with a warm cache:
Page Load Times Before and After Optimization (warm cache)
These results baffled me. The improvements are only slightly better than the cold cache numbers and I really thought page loads would speed up more for repeat visitors. After some investigation, it turns out that I was loading pages in the wrong way. Specifically, I was hitting the f5 key to reload the page in my warm cache tests. Reloading with f5 has a special meaning in at least Firefox. It tells the browser to check that every item in the cache is the correct version even if the item is set to expire in 10 years. Here are the numbers without using f5:
Non-f5 Page Load Times After Optimization (warm cache)
Now, those are some fast page loads. I’m not interested enough to revert all my changes and re-test, but some statistics from YSlow suggest that I probably sped up warm cache page loads by a fair amount. Here’s the initial Stats tab, prior to any optimization:
And here’s the final Stats tab after all the optimizations:
Of particular note: in the case of a primed cache, I reduced the number of HTTP requests from 25 to 15.
Overall, I’m satisfied with the performance improvements. Even if the improvements are not amazing, working with YSlow is not particularly hard or time consuming. In fact, writing this article took much longer than the actual changes I made to my web site. A 15% reduction in load times is worth an hour or two of writing scripts and changing server configurations.
An article/tutorial on using the YSlow firebug extension to optimize web site performance.
I’m a big fan of Google Tech Talks. Recently, I caught a particularly interesting one from Steve Souders called Life’s Too Short – Write Fast Code. Steve Souders used to work for Yahoo! as their chief web performance guru; he now does much the same at Google. The talk was a continuation of a previous lecture on a firebug extension that he wrote called YSlow. Suffice it to say, he knows what he’s talking about when it comes to high performance web sites. Anyway, it piqued my interest, so I decided to try to improve the performance of my own site. This article/tutorial and the next chronicle the changes I made and the results that I obtained.
YSlow’s 13 Rules to Improve Web Site Performance (rules 1-4)
YSlow grades web sites based on 13 different rules. Browse to your site, click on the YSlow icon in the lower right corner of Firefox, and click on the Performance tab. Here’s how one post on my site ranked before making any changes:
Clicking on any rule takes you to a page explaining it in more detail.
A quick note: the screenshot above was taken before I disabled Woopra, which is web analytics software similar to Google Analytics. I had tried it out a couple months ago and had simply forgotten that I had it enabled. Disabling Woopra improved my grades very slighty in a number of rules, but I don’t include that optimization in the main article because it’s technically a functional change to my site. As explained below, YSlow doesn’t make functional recommendations.
What follows are the changes that I made to my site, rule-by-rule.
1. Make fewer HTTP requests
Expanding the rule will show counts of each component if they exceed YSlow’s thresolds:
Combining CSS into one file–especially when it’s all hosted on the same server–is a really good idea. In this case, I’m serving my main style.css file and two other stylesheets from the wp-recaptcha plugin and the Deko Boko plugin. I wrote a simple script to combine all of the files into one, like so:
I also modified both plugins to no longer include a link to their original stylesheets in the <head> section of the page. Search the PHP files for references to the original stylesheets and comment those lines out.
Before moving on, it’s worth mentioning that you should use CSS sprites if at all possible. Many sites use tens of icons per page. Combining those tiny icons into one sprite is a huge performance boost.
2. Use a CDN
One persistent criticism of YSlow is that it focuses on performance improvements to major sites. I don’t really agree with that; in fact, nearly every rule applies equally to both small and large sites. Use a CDN is the exception. A CDN allows you to serve content to users from geographically close servers. It’s a decent performance improvement, but it’s also incredibly expensive.
3. Add an Expires header
What’s the fastest HTTP request? One that never takes place. The idea behind far future Expires headers is that you should allow permanent caching of any page component that won’t change or even that is unlikely to change very often. If it does need to be modified, you simply rename it. To accomplish this using Apache, enable mod_expires and add the following to the virtual host configuration or the main .htaccess:
ExpiresByType image/gif "access plus 10 years"
ExpiresByType image/jpeg "access plus 10 years"
ExpiresByType image/png "access plus 10 years"
ExpiresByType text/css "access plus 10 years"
Warning: The preceding configuration means that any time you change an image or stylesheet served my Apache, you must rename it. If a browser or proxy server has the component cached, it will never check to see if its version is the same as the version on your server. For images, this is probably not an annoyance; changing an image usually means uploading a new version in which case it will have a different file name anyway. For stylesheets, this can be annoying if you don’t automate the process.
I solved the stylesheet problem with a script. In the same directory as style.css, I created a text file like so:
$ echo 0 > serial.txt
Next, I modified the script that combined the 3 original stylesheets I was using like so:
The script reads a number in from serial.txt, increments it, creates a new stylesheet with the number appended, and in this case modifies WordPress’ header.php to link to the new stylesheet. Every time I want a different style on my site, I modifiy style.css.original and run this script.
4. Gzip components
Everyone in every circumstance should enable mod_deflate. Way back in the early days of the web there were browsers that didn’t play well with gzip; they basically don’t exist anymore. In Apache, enable mod_deflate–it’s probably already enabled–and add the following to the virtual host configuration or the main .htacess:
When I started writing this post, I didn’t realize how long it was going to be. Part two, which will include the other 9 easier-to-implement and less important rules, should be up within a few days. I’ll also include some before-and-after performance statistics.
Firefox 3 added native form widgets on Linux. Most of the time they look great, but on some sites including mine, they look awful. Here’s how I styled my forms to avoid native widgets.
Actually, the native widgets only look terrible if you’re on Linux with certain gtk+ themes and you’re viewing a web site with a dark background. Unfortunately, all of those conditions apply to me. See below:
As you can see, there’s an ugly white box around the otherwise rounded widgets. Originally, I thought my CSS was lacking, but after hours–literally–of googling I determined that Firefox’s implementation of gtk+ widgets is just shoddy. Of course, on a site with a white background, they look great. I’m reminded of that Henry Ford quote: “Any customer can have a car painted any colour that he wants as long as it is black.”
Applying any styling to the input elements will completely disable the native widgets. I ended up writing some basic CSS:
They don’t look nearly as good as native widgets on a white background, but they look a whole lot better than native widgets on my site. See below:
I also added some input focus bling to the form:
Even though they look alright now, I’ve decided that my next major project on my blog should be a complete theme redesign. Traditional color schemes are easier to work with and I’ll also get familiar with a lot of WordPress PHP code.