mediawiki antispam measures: $wgSpamRegex

There are several approaches to block spammers and vandals from posting in mediawiki. The main one is how easily a sysop can undo or roll back an edition: just pressing a button.

rollback and undo in mediawiki

However, this is a strategy a posteriori, when a priori is preferably, at least for the most basic rules. Here you will find the first of a series of posts related to antispam measures.

$wgSpamRegex

Without installing any extensions, an easy way of blocking unwanted content is $wgSpamRegex variable in LocalSettings.php:

$wgSpamRegex = "/buy-viagra/";

But take care with this. The value must be a regular expression, so if you use “/cialis/”, it would block words like “specialist” all over your wiki!

CSS hidden or display:none

A different problem consists of a spammer adding her/his code and hiding it. What’s the point? The spam code will work (usually, some javascript code) even if the usual visitor cannot see it. So $wgSpamRegex should block the CSS code like height:0px; or display:none:

$wgSpamRegex = "/".
  "heights*:s*[0-4]px|".
  "displays*:s*nones*|".
  "/i";

Here there is an unwanted side effect: this blocks copying text from Microsoft Word to MediaWiki. The reason is well known: Microsoft Word includes thousands of CSS hiding codes as styles. There is no solution for this. You can suggest a workaround, like copying and pasting to a middle tool like the Notebook, and from there to Mediawiki. That should remove all styles. But not all users will be eager to do this.

The final code

The code suggested in the mediawiki $wgSpamRegex manual is too extensive to maintain it.

In my opinion, there is no point on adding a line here for each website which is blocked for spam. I will cover that in a later post. The regular expression filter should block just the CSS and some malicious chars by default, and only that.

The solution, copied from the manual talk page, is:

$wgSpamRegex = "/<.*style.*?(display|position|overflow|visibility|height)s*:.*?>/i";
share this on...
  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks
  • email
  • Google Buzz
  • LinkedIn
  • Meneame
  • PDF
  • RSS

Creative Commons License
The mediawiki antispam measures: $wgSpamRegex by geometrus, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

This entry was posted in featured, mediawiki and tagged , . Bookmark the permalink.

3 Responses to mediawiki antispam measures: $wgSpamRegex

  1. LWeller says:

    Can you use $wgSpamRegex to block all external links, then allow only links that are specified in a list – probably listed directly after the “don’t allow” expression?

  2. fulgen says:

    I am not sure this would work alone – I’d suggest to use both Black and Whitelist extensions:

    http://www.mediawiki.org/wiki/Extension:Blacklist

    http://www.mediawiki.org/wiki/Extension:WhiteList

    The $wgSpamRegex is a low-levelish way of blocking sites, and would run earlier in LocalSettings.

    In any case, this is all theory and you should test it – please write about it!

  3. LWELLER says:

    I looked at the blacklist extension, but it’s written so that you add just the domain name without the http://. Since I don’t know much PHP, it’s a puzzle how to modify this to simply block all text that begins with http:// – and might even need https:// and ftp:// and similar. If anyone has advice, it would be appreciated.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>