Perl

Introduction

Many developers would hate to have their master-work described as a mess, but not Larry Wall, creator of Perl and celebrity hacker. The way he sees it, the language is a mess because the problem domain — real life — is also a mess. He has a point.

I first came across Perl a few years ago when I was writing a program that required a certain amount of ‘screen-scraping’ from a telnet session, the ability to retrieve files using FTP, some complex processing and interaction with an Oracle database. This is a fairly messy problem, and one that Perl looked eminently able to solve.

Originally I came up with a design involving precarious shell-script creations, plus PL/SQL and a pile of other logic thrown in for good measure. This is even more messy and, worse, I can’t see how it would ever have worked. Then, at various times, people would suggest Perl. It’s good at text handling one would say. You can connect it to a database another would say. But it was the telnet library that sold it to me. I still hadn’t figured out a reliable way of doing that in a shell script.

The philosophy

With the tools that I already had, the problem may well have been impossible. I could have done it in C, but I didn’t have the luxury of time. I might have figured out a way of doing it with shell scripts, but it’d be a nightmare to debug and support.

It sure looked possible in Perl, and it had to be easier than doing it in C.

The Perl philosophy, it turns out, is make the every day things easy and everything else possible. The kind of thing that I wanted to do wasn’t exactly typical Perl fare, but it had all the right elements.

Much of the time I use Perl like a super-shell script. Thinking of it as such is not completely wrong, but it does do the software a great disservice. Sure, you can use it like that. But you can also make ‘real’ programs, complete with declared variables, objects and a GUI user interface.

You can do a lot in Perl, and it doesn’t try to cramp your style. Want to use objects? Go ahead! Think they look too complex? You don’t have to use them! You can write programs like glorified shell-scripts, or like C. Roll your own, use a library, the choice is yours.

This neatly brings me to another one of Perl’s philosophies. (The Perl definition of a philosophy is as messy as the language and the problem you end up solving with it.) This other message is: There Is More Than One Way To Do It.

Let’s try a trivial example: the if statement. Most languages insist you do something like this:

if ($a == 5) {
  print "a is five!n";
}

Indeed, this is perfectly valid Perl. But so is this:

print "a is five!" if ($a == 5);

I don’t necessarily think that this represents good programming style, but then I don’t have to use it. My choice.

The syntax

As you can see in the above example, Perl bears more than a passing resemblance to C. It uses the double equals (“==”) to compare numbers and semi-colons as statement terminators.

It’s also a bit like Unix shell scripts. Note the use of the dollar to identify variables. However, Perl is much more insistent on their use than Bourne. You must use the dollar all the time, even in assignments. This gets annoying if you change between languages with any degree of regularity.

The good news is that Perl doesn’t always use a dollar sign to identify variables. The bad news is that it also uses the at sign (@), the percent sign (%) and the ampersand (&). (At least these are the common ones. There was a joke going round recently suggesting that Perl 5.6 only supported Unicode character because they’d run out of symbols on the normal keyboard. At least I hope it was a joke.)

Fortunately it’s not completely indiscriminate. It turns out that Perl has only three data types: scalars, lists and hashes. The dollar identifies scalars, variables that accept a single value, at’s are used when you want to put several values in a single variable and the percent sign is used to identify associative arrays (which Perl calls hashes).

For example:

# Scalars
$a = "hello";
$b = 1.2;
# Lists
@c = ("hello", "goodbye");
@d = ($a, $b);
# Hashes
%e = ( "var1" => "value 1", "var2" => "value 2" );
# Output
print "$a, $c[0], ", $e{var1}, "n";

There are several interesting things to note here. Firstly, you don’t have to declare your variables (although you can if you want). Secondly, scalars store any single value, whether number, character or string. They are weakly typed in the same way that Variants are in Visual Basic; the system knows what’s there and will do different things based on that information. We’ll see more of this later.

Lists can store any number of scalars (which, as you can see, don’t all have to be what most other languages would consider the same type), with Perl performing all the memory allocation and deallocation, much the same way as the much-vaunted Java garbage collector. Many of the same properties apply to hashes.

Perhaps the final interesting property is the way you read from the non-scalar types. As you can see, you must use the dollar — scalar — to access them. This does make sense when you stop to think about it: you’re reading a single value not the complete list (which is still represented using the at sign).

The Perl Difference

All this variable stuff is unusual, but it doesn’t make it stand head and shoulders above everything else (in fact, the weak-ish typing and lack of user-defined types make it much worse than many others). But Perl is used in just about every CGI script in existence for a very good reason.

Is it the same reason that Perl has a reputation for being unreadable. Generally Perl is no less readable than, say, C but the one aspect that confuses just about everyone the first time are regular expressions. If you like code that looks like line-noise, this is the feature for you!

Regular Expressions are a way of representing patterns. For example this line could represent UK postal codes:

^[A-Z]+[0-9]+ [0-9][A-Z]+$

Or in English: one or more letters at the beginning of the line, followed by some digits. Then there’s a space followed by a single digit and one or more letters. The pedantic might note that all the letters must be uppercase.

Like all the usual Unix command-line tools, Perl allows you to look for and manipulate patterns in files. Perl extends the usual array of tokens allowing you to make fantastical expressions that quickly become completely unreadable. The power comes from the fact that all this is embedded right into the language, no clumsy function calls.

while ($line = <>) {
  chomp $line;
  if ($line =~ /1234/) {
    print ":$linen";
  }
}

Note here that in the third line I compare the input (“<>” reads the next line from the currently open stream, which is stdin by default) to an expression. Clearly this is a very simple example, but you should be able to see that having this functionality builtin gives the language the ability to express some very complex ideas concisely.

Extras

One of the things that Perl does better than just about any other language is plug-in modules. Perl 5 added some clunky object-oriented-like features and, while they may not be elegant, they do seem to work.

A testament to their power is the number of modules available for free download at CPAN (the Comprehensive Perl Archive Network). There are modules for connecting to just about any relational database, libraries to talk to all the Internet protocols I could think of and code to deal with XML and configuration files (or XML configuration files). There are so many modules you rarely have to write much in the way of code yourself.

Summary

I’ve barely begun to scratch the surface of Perl here. There’s much more to it, but the beauty of the language is that you don’t actually need to know that.

The language itself has a tendancy to look ugly and, often, unreadable. But it’s used just about everywhere you can find a Unix box. There are few other languages that come close to Perl for hacking text files and automating boring system administration tasks. It takes over where shell scripts leave off and only starts to run out of steam for projects thousands of lines long (it’s technically able to cope with more, but it’s not a real software engineering language). System Administrators don’t call it a Swiss Army Chain-saw for no reason.

As a computer language purist, I really want to hate Perl. It has weak typing, no data structures, it’s proud of the fact that there’s more than one way to do everything and the syntax just looks plain ugly. But this is not a pure or perfect universe. These very ‘flaws’ make Perl the ideal tool for the jobs it was designed for.

I may not actually like it, but I use Perl for just about all my hacking activities. It’s just too useful to ignore.

Whats with Windows 2000?

It is now two days after Microsoft’s official release of ‘the next generation’ of their premier operating system, Windows 2000 (n?e, NT5). We’re now at a safe distance to be able to assess the impact it has had on people and the press.

The first interesting thing to note is that on Slashdot, the Internet’s favorite site for hacker-oriented hi-tech news, did not make any announcement. One argument is that Slashdot is Linux, or at least Unix, biased making Windows news irrelevant. I don’t buy that. What Microsoft is doing is important if Linux is to achieve world domination.

The real answer came as a comment to another thread (about a new development kernel release), not by the sites editors. 17th February is not really a very significant date even to Microsoft. The software has been available to big customers — the main target market — for months and even smaller customers should not have had too much difficulty finding a copy. The only significance is that you can buy a shrink-wrapped copy. Big deal.

But should you buy a copy?

This brings me to my second point: despite a sprinkling of pro-Linux-is-Microsoft-doomed? articles, almost all the press I’ve read pretty much follows the line of Microsoft’s PR company. Whatever happened to reasoned, critical journalism?

Since there’s so much of this, I’m loath to identify individual magazines or articles, it just wouldn’t feel representative. The kind of thing I’m talking about are blanket statements such as “Windows 2000 is faster, more scalable and more reliable than NT.”

Where do they get this from? There is certainly no ‘real world’ evidence of this. If you discount this months release, people have been trialing the OS on small, test systems for a few weeks. Without a realistic load who can say, honestly, that it’s more reliable? And does ‘more reliable’ just mean ‘better than NT4’ or does it mean ‘as good as ‘Unix’? (Personally I believe neither interpretation. I very much doubt that a first release can be as reliable as NT4 with all the service packs, and that’s before we get to the months of uptime you can expect from a well configured Unix box.)

If reliability is difficult to understand, more scalable is laughable. At work we’re using a four processor Xeon 550Mhz machine with tonnes of disk-space. Right now there are very few Intel boxes that are bigger than that. Okay Win2K may support that hardware better than NT4 but it’s still the biggest you can get. An equivalent Sun machine probably falls into the ‘midrange’ category. What happened to the Alpha support? What happened to the PowerPC port? Both these architectures are far more scalable. And Linux, popularly believed to be less scalable than NT, supports them all.

So far, this piece is definitely painted as an anti-Microsoft tirade. That’s not going to change substantially, but Microsoft does deserve some credit for getting something the size and complexity of Win2K out the door at all. Check the metrics and success rate of projects that are thirty-five millions lines long. And there are some nice features. The GUI admin tools are not matched on any Unix implementation I’ve used and some things, such as the file protection and the separation of web applications from the web server, are long overdue.

However, the late delivery, high price and Microsoft-only nature of many new features don’t help in Microsoft’s defense against the monopoly allegations.

Photography, opinions and other random ramblings by Stephen Darlington