Blocks, both technical and mental

Blocking content from the Internet is getting a lot of press of late. The last couple of weeks has seen the Pirate Bay being blocked by a number of large ISPs and debate over whether the blocking of “adult” content should be opt-in or opt-out.

Unfortunately the enthusiasm to “protect the children” and “protect the copyright holders” seems to have pushed aside much of the debate of whether we should be doing this at all or whether it’s practical.

Whether we should be doing it or not is political. I have my opinions ((I’m basically anti-censorship and in favour of personal responsibility. There are already laws covering the distribution obscene materials, why should there be restrictions on legal materials?)) but what I want to concentrate on here is whether or not blocking such content is actually possible.

There are a number of different ways of vetting content. They’re not necessarily mutually exclusive, but they’re all deeply flawed.

First, a common one from politicians: the Internet is just like TV and cinema:

Perry said that she has been accused of censorship over the campaign, but argued that the internet was no different to TV and radio and should be regulated accordingly.

No, no it isn’t. There are a handful of TV channels, even taking cable and satellite into account, and a relatively small number of movies released every week. It’s practical to rate movies. TV programmes are distributed centrally, so pressure can be placed on a small number of UK-based commercial entities when they do naughty things.

The Internet is very different. Firstly, counting the number of web pages is rather harder. This is what Wikipedia has to say:

As of March 2009, the indexable web contains at least 25.21 billion pages.[79] On July 25, 2008, Google software engineers Jesse Alpert and Nissan Hajaj announced that Google Search had discovered one trillion unique URLs.

Note that even the smaller number is from three years ago. I’d bet that it’s not smaller now. Clearly the same system of rating an regulation clearly isn’t going to work on that scale. And even if it was possible to rate each of these sites, the UK government has little leverage over foreign websites.

There are basically three ways to automate the process: white list, black list and keyword scanning.

A white list says “you can visit these websites.” Even assuming no new websites are ever added and no new content is ever created, rating those 25 billion pages is not practical. I don’t think we want an official approved reading list.

A black list is the opposite: “you can visit anything except these pages.” We have the same scale problem as with white lists and a few more. Much of the Internet is “user contributed” and it’s not hard to create new sites. If my site is blocked, I can create a new one with the same content very, very quickly. Basically, there’s just no way to keep on top of new content.

Keyword scanning is exactly as it sounds. Your internet traffic is scanned and if certain keywords are spotted, the page is blocked. It’s automated and dynamic, but what keywords do you look for? “Sex”? Well, do you want to block “sex education” websites? “Porn”? That would block anti-porn discussion as well as the real thing.

The scanners can be a lot more sophisticated than this but the fundamental problem remains: there’s no way to be sure that they are blocking the correct content. Both good and bad sites are blocked, and still with no guarantee that nothing untoward gets through.

In all cases, if children can still access “adult” content with relative ease — both deliberately and accidentally — what’s the point?

Of course I’m not in favour of taking content without paying for it or exposing children to inappropriate material. But, to use a cliche, the genie is out the bottle. Like the reaction to WikiLeaks there is little point in pretending that nothing has changed or that the same techniques and tools can be used to fight them.

Instead, if you’re a publisher you need to make your content legally available and easier to access than the alternative. iTunes has showed that people are willing to pay. So far, you’ve mostly shown that you’d rather treat paying customers as criminals. That’s not helping.

As for protecting children, it all comes back to being a responsible parent. Put the computer in the living room. Talk to them. Sure, use white or black lists or filtering, just be aware that it can never be 100% effective and that not everyone has children that need protecting. Whatever the Daily Mail and your technically unaware MP says, you can’t say the connection is being checked, problem solved.