Tag Archives: Programming

Cooledit 3.5.3

Introduction

Text editors are very personal things. I found this out when I wrote a very positive review of NEdit. I received a number of comments, nearly all by email. No-one agreed with me, which doesn’t by itself bother me. But no-one disagreed with me as such either. All the messages were from Emacs users who thought that it was the only real editor and that I was misguided thinking otherwise.

This time I think I’m unlikely to incur the wrath of Emacs users. Cooledit is just not going to appeal to the same people. If Emacs is equivalent to O’Reilly books, Cooledit is a Dummies Guide.

Look and feel

This is going to become the theme of the review: text editors are very personal things. When I say that I’m not keen on either the look or feel of Cooledit, I must also point out that there is nothing wrong with it either.

There are good bits. I like the default colour scheme. White on blue is very easy on the eyes. And the default, fixed-space typeface is very readable, too — I’m not sure what it is, though. The tool-bar down the left-hand side of the text window is useful and it would be immediately obvious what the buttons did even if it didn’t have text on the icons and ‘tool-tips.’

For a text editor with an emphasis on ease-of-use, Cooledit is actually quite powerful. In addition to the normal cut and paste, search and replace and file management facilities, Cooledit also has built in scripting and dialog-based configuration of keyboard shortcuts and the environment.

It does, however, look a little amateurish. While the menus do have Office 97-style highlighting when the mouse point moves over them, it’s difficult to take them seriously when they drop down. They look so big and clunky that you can immediately tell that they’ve been designed by a programmer. They’re not easy to get rid of if you pull down a menu by mistake, either.

And let’s not forget Cooledit’s most unique feature: it’s multiple document interface. Inside the main Cooledit window, you can have as many documents as you like. Behind all the windows is a picture of an igloo; it’s not entirely clear why it’s there. Documents can be moved around by dragging their border, and can be resized by dragging the bottom-right corner. These inner-windows don’t have title bars, which after years of finding a documents name by looking in the title-bar is a little confusing.

The interface does not take full advantage of its multiple windows, either. The load button and the open menu item actually operate on the currently open window rather than for Cooledit as a whole. This means that to open another file you need to open a new window (from the Window menu) and then load a file into it. This kind of operation would be acceptable for a program with only a single buffer — such as vi — but for a modern, GUI editor it’s just odd.

Conclusion

This has been a rather short and very negative review. As I have said, there is nothing strictly speaking wrong with Cooledit, it just feels wrong. It has all the right things in the right place; functionally it’s at least as good as NEdit. But it’s idiosyncratic user interface and ‘jokey’ appearance have a tendency to distract you from your work!

If you like to be different and you liked Borland’s cartoon ticks and crosses in its Windows applications, then you might go for Cooledit. If not stay well clear.

Extreme Programming Explained

Introduction

Naturally the key selling point of Extreme Programming — reduced risk and increased fun — appeal to me. I’ve worked on many projects that were either risky, no fun or both and any way to improve that would be a good idea.

However, most of the successful projects were run using fairly heavy-weight methodologies, CMM or ISO accredited, for example. Extreme Programming promises to deliver the benefits while still being simple. I was sceptical. This sounds a little too close to one of Fred Brookes Silver Bullets.

How does it work?

Extreme Programming sounds too simple to ever work. None of its components are new or especially controversial, some are even very common throughout the industry. What’s new is their application together. The key is that the strengths of one part paper over the weaknesses of others.

I won’t detail all the bits — that’s what the book is for — but there are a couple of things I want to talk about.

The one thing that came through in all other reviews of the book was the pair programming aspect. The idea is that whenever writing code, there are always two people sat at the computer. This sounds very wasteful of resources but is based on some very sound theories: reviewing code is good and sharing knowledge around the team reduces the dependency on any one person. XP takes it to the extreme (hence the name) by having, in effect, every line of code being reviewed by two people all the time.

I can certainly see the benefits, but I can imagine a client being very suspicious when you suddenly have two people appearing to do a job you previously only had one doing. He also proposes using a utilisation metric which documents that people are actually not doing genuinely productive work all the time.

Anyone that writes software will immediately recognise that a lot of time is spent helping colleagues, in design or review meetings and any number of other things that are not directly and visibly related to increasing the functionality of a program. Making it more visible sounds like a much harder “sell” than is suggested in the book.

Is it easy to understand?

Brock is clearly very enthusiastic about his methodology and this shows in his prose, which is clear, friendly and concise. It’s sensibly organised and it usually answers questions you’re thinking of in just the right place.

However, it only does all that if you remember what the book is trying to explain. It’s trying to explain Extreme Programming not how to implement it effectively. I had a little difficulty seeing how you could automate tests in some environments. We tried and failed with Oracle Forms (or whatever it’s called this week) for example. I don’t disagree with the principle, but there are some definite technical barriers that are not even acknowledged.

I’m not sure that the distinction between describing what XP is and how to implement it is a good one. It sounds like an excuse to sell two books when only one would suffice. Why would anyone be interested in learning the theory but not the practice of a simple methodology?

Disadvantages

Beck does not claim that Extreme Programming works in all circumstances. There’s an entire (albeit short) chapter on it, which is quite refreshing. Most people would claim that their baby was ideal in all circumstances!

I only thought of a couple of other problems that are not addressed, although neither are necessarily any more of a problem with XP than with other methodologies. (However, they are areas that are claimed to be addressed.)

Documentation is never kept up to date anyway, and in XP there is a much greater chance of at least one person knowing how it works. Couple that with a full set of tests, the ‘business’ person being on the team right from the start and, well, what’s my problem?

The problem is identified in Death March. Who are the stake holders on the project? Is it your team-member (the end user)? Is it their manager, who might want a bunch of reports? Is it the CTO who doesn’t actually care what the system does and just wants a success of some kind in order to get promoted?! The book and the Extreme Programming approach treat the problem of requirements as something that is eventually known and is fix-able given close enough interaction with the users. I wish that were true.

Beck might suggest that this ‘unknown’ is just another form of change than can be managed in the normal manner, but my guess would be that the magnitude of changes required by people that don’t have day-to-today access to the team would be too big and would break the process.)

Or perhaps this is just a size issue. With a system, with a small number of well-defined users there’s no reason it couldn’t work.

Conclusion

I think my initial and immediate dislike of XP stemmed from the fact that most of the projects I’ve ever worked on have fallen into the “won’t work” camp. Now that I understand its background I can appreciate it much more.

And it’s only fair to note that this appreciation comes solely from the book. It’s worth reading even if you don’t plan on implementing Extreme Programming.

The facts

Author: Kent Beck

Cost: $29.95

ISBN: 0-201-61641-6

C

Introduction

Talking about C is not easy. Almost all professional programmers have used it at some point and many have a strong attachment to it. I don’t want to start by saying that it’s a poor language, alienating much of my audience, but I figure I’m going to end up doing that anyway so I may as well get it out of the way at the beginning.

Compared to many languages that have come since, and even some that came before, C just isn’t a very good language.

There, I’ve said it.

History

Perhaps more than almost any other language since, C has a rich and famous history. It’s almost impossible to discuss it without also talking a little about the history of Unix.

In 1969, Ken Thompson finally got hold of a PDP-7 and decided to write an operating system for it. (It happened more often than you’d think back then.) That operating system was Unix.

A few years later, Dennis Ritchie designed and built a language based on B (itself based on a language called BCPL) and, creatively, called it C. Unlike its immediate ancestors, C had types and a number of other useful bits and pieces.

C was so useful that Unix was quickly rewritten in it. These days that sounds obvious, but until that point operating systems had always been written in assembler or machine code. This is an important part of computer history.

Utility

C became popular largely because it filled a very useful niche. At one extreme you have all the ‘real’ languages. At the time, real computer scientists would have used Algol68. Structured and clever, Algol was a great language but you couldn’t do anything very low level with it.

At the other extreme there is assembler which people had to resort to if they wanted to do anything close to the hardware. Assembler is only one step removed from machine code which makes writing reliable, bug-free code very difficult, especially when you’re building something as large and complex as an operating system.

C fits in the middle. Described by some as a high-level assembler, it allows you to do low level coding, accessing particular memory addresses and the like, and use high-level constructs such as functions and types.

The following books and papers helped me learn to hate C.

Practical C Programming” by Steve Oualline.

Writing Solid Code” by Steve Maguire.

Code Complete” by Steve McConnell.

Hello world. How many nasty ways can you write “Hello World” in C?

Pointers

Key to C’s ability to mix high- and low-level constructs are pointers. Most ‘serious’ languages have some concept of a them. Some call them references, some call them links, but they all, basically, refer to something that identifies a chunk of memory. Most other languages only use them when you have to, but they are C. No pointers, no language.

You want an array? That’s really a pointer. Pass by reference? A pointer. Strings? Ah, they’re arrays! (Which are pointers.)

The advantages of using pointers for just about everything in the entire language are mainly one sided: it makes writing compilers easier. For the poor souls that actually end up using the language all is not so rosy. As Steve McConnell puts it, “pointers are one of the most error-prone areas of modern programming” (Code Complete, section 11.9).

Some of the side-effects of using pointers are not immediately apparent, either. In most languages, arrays have bound-checking (the ability for the language to raise an error if you try to access an element that doesn’t exist). But C doesn’t really have arrays, it has pointers and a little syntactic sugar that makes it look like it has arrays. Pointers don’t know how much memory is being pointed to so you don’t get bound-checking. If you’re lucky your program causes a segmentation fault, if not you might corrupt other data or, on some operating systems, your program.

It takes all types

Another one of C’s biggest problem is it’s typing. As most people will already know, C allows you to put numbers into character variables, integers into floating points and any number of other nasty combinations. To C they’re all valid, but what they do are not always well defined or consistent. Even the same compiler sometimes does different things depending on the level of optimisation in use, the phase of the moon, etc.

The odd thing is that a weakly typed language doesn’t allow you to do more than one with strong types, it merely allows you to do the wrong thing more easily. For a language used for large-scale software engineering projects the risk of poor code is just too great and weak typing should be outlawed.

I’m sure that people are going to mention the myriad of warnings that modern compilers are able to produce, or that ‘lint’ has been available for nearly as long as the language has been. My counter-argument: I don’t see why you should have to add extra tools or read through pages of warnings in order to correct deficiencies in the source language!

The good bits

If C was truly as appalling as I’ve made out so far, no-one would actually be using it. The main ‘win’ for C, as far as I can see, is that it is small, well defined and widely available.

All three merits are in many ways different sides of the same coin (if you can imagine a three sided coin). To make the language well defined, it helps if it’s small. If lots of people are to use it, it needs to be simple enough that they aren’t put off (Ada anyone? Thought not.). Small and well-defined make it easier to write compilers too, meaning that it’s available on everything from the lowliest PC right up to mainframes.

The theory also goes that your programs should recompile on this range of machines too, but that’s not as true as we’d all like to think. If it was true, we wouldn’t need Java or hundreds of ‘#ifdefs’ throughout. My code tends not to be that low-level, but I wouldn’t like to make any promises about its portability. However, that doesn’t make C’s wide availability useless. Even if your program isn’t cross-platform, the skills required are. A C programmer can quickly write code for just about any machine.

Summary

I’ve probably written more lines of C code than just about any other language, so when I say that I don’t like it I hope that you can see that I’m not being narrow-minded or prejudiced.

As I’ve mentioned above there are some things that I like about C, it’s just that there is so much to hate about it and, even at the time it was written, there wasn’t an excuse for it!

The languages main features are its weak typing and over-use of pointers, both of which allow developers to make truly horrendous coding errors with ease.

If those were C’s only problems I might be able to forgive it, but they aren’t. C has more ways for both experienced and novice programmers alike to hang themselves than any other language I can think of (with the possible exception of Intercal). Even if it didn’t have the ‘=’ and ‘==’ operators to confuse, there’s still the wonderful ‘?:’ and a whole host of spectacularly error prone API calls (does all your code check the value that ‘malloc’ returns?)

No, C, as a language, is a dinosaur. It deserves to whither and die. If you write anything other than an operating system kernel and use C, switch to another language. You’ll be far more productive when you start battling the problem rather than the language.

Output

Introduction

PL/SQL is good for many things. It has a structured, Ada-like syntax and can have SQL statements seamlessly embedded in it. Although not compiled right down to machine code, it’s even fairly quick.

But one thing that most people initially have difficulty doing is displaying information on the screen. The first thing I did was to enter what I wanted into a temporary table and use SQL*Plus to select from it. This works well for reports at the like, but it doesn’t work at all if you want to display something independently of a transaction or halfway through a function.

I was frustrated that there wasn’t a better way of feeding back information to end users. I was wrong to be frustrated: there is a better way.

Hello world!

The best example I can think of doing is the world famous “Hello, World!” program, so here goes:

CREATE OR REPLACE PROCEDURE hello_world AS
BEGIN
     DBMS_OUTPUT.PUT_LINE ('Hello, world!');
END hello_world;
/

In SQL*Plus you can run that procedure by typing “exec hello_world” and pressing return.

You may be surprised to see that it says “PL/SQL procedure successfully completed.” and nothing else; there’s no “hello world” text. Surely it’s lying, it hasn’t successfully executed anything, has it?!

As with most things Oracle, things are not quite as simple as they first appear. In this case you need to tell SQL*Plus that you’re going to use the DBMS_OUTPUT package. To do that, you need the following command:

set serveroutput on

You might want to put this in your “login.sql” file so that it gets executed every time you log in. You can also add the “size” parameter to the end to indicate how much output you’ll be producing. The most you’re allowed is around 1Mb so I usually type “set serveroutput on size 1000000”.

If you try executing the procedure now, you’ll find that it works correctly.

A longer example

For our second and final example, we’ll use one of the tables that Oracle installs by default. It’s called EMP and should be in the SCOTT schema (password TIGER). The important fact here is that there are a number of different columns types, NUMBER’s, VARCHAR2’s and DATE’s.

On to the example:

CREATE OR REPLACE PROCEDURE example2 AS
     CURSOR emp_cur IS
         SELECT e.empno, e.ename, e.sal, e.hiredate
         FROM   emp e
         WHERE  e.sal > 2500;
    BEGIN
     FOR current_employee IN emp_cur LOOP
         DBMS_OUTPUT.PUT_LINE ('-------------------');     -- string literal
         DBMS_OUTPUT.PUT_LINE (current_employee.empno);    -- number(4)
         DBMS_OUTPUT.PUT_LINE (current_employee.ename);    -- varchar2(10)
         DBMS_OUTPUT.PUT_LINE (current_employee.sal);      -- number(7,2)
         DBMS_OUTPUT.PUT_LINE (current_employee.hiredate); -- date
         DBMS_OUTPUT.PUT_LINE ('Year: ' ||
                               TO_CHAR(current_employee.hiredate, 'YYYY'));
     END LOOP;
 END;
 /

This code should be pretty much self-explanatory. We first start a loop over a table. Inside the loop we output the content of it. The first PUT_LINE statement shows that you can output a string literal (we know this as ‘Hello world’ is one too). The second shows that you can output numbers without first converting them to a string — the function is overloaded and will accept most standard types.

Following that theme, the third, fourth and fifth lines output a string, a different number and a date, in all cases using the default format string. The last PUT_LINE statement shows that you can output expressions, but note that it all needs to be of the same type (so in this case I convert a date to a string so I can concatenate it to another string).

On my machine, the output is as follows:

SQL> set serveroutput on
SQL> exec example2
 -------------------
 7566 
JONES 
2975 
02-APR-81 
Year: 1981 
------------------- 
7698
 BLAKE 
2850 
01-MAY-81
 Year: 1981 
------------------- 
7788 
SCOTT 
3000 
19-APR-87
 Year: 1987 
------------------- 
7839 
KING 5000 
17-NOV-81 
Year: 1981 
------------------- 
7902
 FORD 
3000 
03-DEC-81 
Year: 1981
PL/SQL procedure successfully completed.  
SQL>

Conclusion

As we’ve come to expect from Oracle over the years, simply outputting a little text to the screen isn’t quite as easy as you might wish for.

And it’s not without it’s problems. If you want to send more than a megabyte to the screen, you’re still going to have difficulty and all the playing around with “set serveroutput” is just plain tedious.

However, the important point is that you can do it and, programatically, it’s (basically) just a single command.

If you want more information on this kind of thing, have a look in chapter 6 of Oracle Builtin Packages, a whole book dedicated to this kind of thing.

A matter of style

Introduction

The open source community does a lot of things right. The internals of a program is one of them. The people who write code do so because they are proud of what they do and want the respect of other people in the community. The beauty of the code is a very important aspect in this acceptance.

The same isn’t necessarily true in the commercial world. Time-scales are much more important than how the guts of a computer system looks and it’s generally not good to be seen spending a bit of time making your code look pretty.

This column is not going to say that style is more important than substance, but that this interest is something that can bring a large increase in productivity.

Indentation

I know people that hate me for this, but something that really bugs me about some code is the indentation style. I can handle a poorly thought-out, hacked algorithm if there’s a good reason (time usually), but I can never see a good excuse for badly formatted code.

Initially I thought I was being anal, just getting very hung-up on a not-very-important detail, but then I started noticing things. Usually the people that produced the worst formatted code also included the largest number of defects and, although they initially appeared to write code more quickly than me, finished late much more frequently.

Why does that matter?

There are two fundamental reasons why the ‘look’ of the code matters:

  1. Comprehension. How quickly can people understand the code?
  2. Structure. Is it obvious how the program is split into units? Is it obvious what the next statement to be executed is?

I guess they are quite similar, but I feel that it’s important to separate them out. Hopefully you’ll see why in a minute.

Let’s talk about comprehension first. Here’s a snippet of code:

if (x < foo (y, z))
   haha = bar[4] + 5;
 else
   {
     while (z)
       {
         haha += foo (z, z);
         z--;
       }
     return ++x + bar ();
   }

If you've read the GNU coding standards, you might recognise this as an example of good formatting. It isn't.

But there are some redeeming factors. Firstly they've used two spaces for each level of indentation and not a tab. Studies have shown that while people prefer (aesthetically speaking) tabs in programs, those same people actually understand the code more effectively with between two and four spaces. Also, the formatting of the mathematical expressions is clear, with good use of whitespace.

Now the bad. My main problem is with the use of the braces. The lesser crime is immediately after the if condition where they haven't used braces at all. That's bad because a novice programmer might add an extra line below the 'haha' assignment. It probably won't be a valid program any more, but it looks okay at first glance. Since maintenance is the largest part of the software life-cycle, anything that makes updating code more difficult is bad news.

Structure

The indentation of the second half is dreadful, though. Why have two levels of indentation to to indicate one block of code? Again, there are studies that show that formatting like this actually reduces the readability of the code.

My preferred method of formatting the same code would be:

if (x < foo (y, z)) {
   haha = bar[4] + 5;
}
else {
   while (z) {
     haha += foo (z, z);
     z--;
   }
   x++;
   return x + bar ();
}

The main difference is the formatting, but I've also altered the 'return' statement at the end. Side-effects (like using '++x' in an expression) seriously reduce readability as it's much more complex to figure out what it's trying to evaluate. And the solution only takes an extra line...

But why is this a better way to format the code? Simple: it shows the structure of the code more effectively. Since there are only three levels of indentation, your brain doesn't have to work as hard to figure out where, for example, the end of the "else" block is. (It's not so easy to see the advantage with ten lines of code. Try to imagine a page full of code with a large number of indentation levels.)

Some languages appear to be more tricky than the C example here, too. Where does the exception block in Java go? Should the procedure definitions in a package header be indented in PL/SQL?

If you don't really understand block-structure in modern programming languages and are just trying to follow the example of fellow developers, these are probably difficult questions. They're not supposed to be.

A computer will understand any syntactically valid program, but not necessarily in the same way that you or I would. This makes it vitally important that everyone involved has a common understanding of how the program is supposed to work. The first steps to doing this are structuring the code sensibly and making it easy for others to comprehend.

The last important comment is about timing: none of this takes any longer than building your code with poor formatting and structure. Why? Well, if you understand how it's structured it'll be more likely to work the first time. And if it doesn't, you should be able to find the buy more quickly because your code is easier to comprehend.

Summary

To summarise, the neatest, best formatted code takes no longer to write and is much easier to maintain than code that has bad 'style' (although it's much less likely to need fixing).

So the next time that you come across some dreadful, untidy code, try to make the person that wrote it start again. If they don't understand how their own code is structured, neither will anyone else.

You'll note that I keep mentioning studies, but don't reference the source. That's because I've read all about it in Code Complete (follow the link to by it from Amazon.com) and not from the horses mouth. I recommend you also read it for the full story.

Perl

Introduction

Many developers would hate to have their master-work described as a mess, but not Larry Wall, creator of Perl and celebrity hacker. The way he sees it, the language is a mess because the problem domain — real life — is also a mess. He has a point.

I first came across Perl a few years ago when I was writing a program that required a certain amount of ‘screen-scraping’ from a telnet session, the ability to retrieve files using FTP, some complex processing and interaction with an Oracle database. This is a fairly messy problem, and one that Perl looked eminently able to solve.

Originally I came up with a design involving precarious shell-script creations, plus PL/SQL and a pile of other logic thrown in for good measure. This is even more messy and, worse, I can’t see how it would ever have worked. Then, at various times, people would suggest Perl. It’s good at text handling one would say. You can connect it to a database another would say. But it was the telnet library that sold it to me. I still hadn’t figured out a reliable way of doing that in a shell script.

The philosophy

With the tools that I already had, the problem may well have been impossible. I could have done it in C, but I didn’t have the luxury of time. I might have figured out a way of doing it with shell scripts, but it’d be a nightmare to debug and support.

It sure looked possible in Perl, and it had to be easier than doing it in C.

The Perl philosophy, it turns out, is make the every day things easy and everything else possible. The kind of thing that I wanted to do wasn’t exactly typical Perl fare, but it had all the right elements.

Much of the time I use Perl like a super-shell script. Thinking of it as such is not completely wrong, but it does do the software a great disservice. Sure, you can use it like that. But you can also make ‘real’ programs, complete with declared variables, objects and a GUI user interface.

You can do a lot in Perl, and it doesn’t try to cramp your style. Want to use objects? Go ahead! Think they look too complex? You don’t have to use them! You can write programs like glorified shell-scripts, or like C. Roll your own, use a library, the choice is yours.

This neatly brings me to another one of Perl’s philosophies. (The Perl definition of a philosophy is as messy as the language and the problem you end up solving with it.) This other message is: There Is More Than One Way To Do It.

Let’s try a trivial example: the if statement. Most languages insist you do something like this:

if ($a == 5) {
  print "a is five!n";
}

Indeed, this is perfectly valid Perl. But so is this:

print "a is five!" if ($a == 5);

I don’t necessarily think that this represents good programming style, but then I don’t have to use it. My choice.

The syntax

As you can see in the above example, Perl bears more than a passing resemblance to C. It uses the double equals (“==”) to compare numbers and semi-colons as statement terminators.

It’s also a bit like Unix shell scripts. Note the use of the dollar to identify variables. However, Perl is much more insistent on their use than Bourne. You must use the dollar all the time, even in assignments. This gets annoying if you change between languages with any degree of regularity.

The good news is that Perl doesn’t always use a dollar sign to identify variables. The bad news is that it also uses the at sign (@), the percent sign (%) and the ampersand (&). (At least these are the common ones. There was a joke going round recently suggesting that Perl 5.6 only supported Unicode character because they’d run out of symbols on the normal keyboard. At least I hope it was a joke.)

Fortunately it’s not completely indiscriminate. It turns out that Perl has only three data types: scalars, lists and hashes. The dollar identifies scalars, variables that accept a single value, at’s are used when you want to put several values in a single variable and the percent sign is used to identify associative arrays (which Perl calls hashes).

For example:

# Scalars
$a = "hello";
$b = 1.2;

# Lists
@c = ("hello", "goodbye");
@d = ($a, $b);

# Hashes
%e = ( "var1" => "value 1", "var2" => "value 2" );

# Output
print "$a, $c[0], ", $e{var1}, "n";

There are several interesting things to note here. Firstly, you don’t have to declare your variables (although you can if you want). Secondly, scalars store any single value, whether number, character or string. They are weakly typed in the same way that Variants are in Visual Basic; the system knows what’s there and will do different things based on that information. We’ll see more of this later.

Lists can store any number of scalars (which, as you can see, don’t all have to be what most other languages would consider the same type), with Perl performing all the memory allocation and deallocation, much the same way as the much-vaunted Java garbage collector. Many of the same properties apply to hashes.

Perhaps the final interesting property is the way you read from the non-scalar types. As you can see, you must use the dollar — scalar — to access them. This does make sense when you stop to think about it: you’re reading a single value not the complete list (which is still represented using the at sign).

The Perl Difference

All this variable stuff is unusual, but it doesn’t make it stand head and shoulders above everything else (in fact, the weak-ish typing and lack of user-defined types make it much worse than many others). But Perl is used in just about every CGI script in existence for a very good reason.

Is it the same reason that Perl has a reputation for being unreadable. Generally Perl is no less readable than, say, C but the one aspect that confuses just about everyone the first time are regular expressions. If you like code that looks like line-noise, this is the feature for you!

Regular Expressions are a way of representing patterns. For example this line could represent UK postal codes:

^[A-Z]+[0-9]+ [0-9][A-Z]+$

Or in English: one or more letters at the beginning of the line, followed by some digits. Then there’s a space followed by a single digit and one or more letters. The pedantic might note that all the letters must be uppercase.

Like all the usual Unix command-line tools, Perl allows you to look for and manipulate patterns in files. Perl extends the usual array of tokens allowing you to make fantastical expressions that quickly become completely unreadable. The power comes from the fact that all this is embedded right into the language, no clumsy function calls.

while ($line = <>) {
  chomp $line;
  if ($line =~ /1234/) {
    print ":$linen";
  }
}

Note here that in the third line I compare the input (“<>” reads the next line from the currently open stream, which is stdin by default) to an expression. Clearly this is a very simple example, but you should be able to see that having this functionality builtin gives the language the ability to express some very complex ideas concisely.

Extras

One of the things that Perl does better than just about any other language is plug-in modules. Perl 5 added some clunky object-oriented-like features and, while they may not be elegant, they do seem to work.

A testament to their power is the number of modules available for free download at CPAN (the Comprehensive Perl Archive Network). There are modules for connecting to just about any relational database, libraries to talk to all the Internet protocols I could think of and code to deal with XML and configuration files (or XML configuration files). There are so many modules you rarely have to write much in the way of code yourself.

Summary

I’ve barely begun to scratch the surface of Perl here. There’s much more to it, but the beauty of the language is that you don’t actually need to know that.

The language itself has a tendancy to look ugly and, often, unreadable. But it’s used just about everywhere you can find a Unix box. There are few other languages that come close to Perl for hacking text files and automating boring system administration tasks. It takes over where shell scripts leave off and only starts to run out of steam for projects thousands of lines long (it’s technically able to cope with more, but it’s not a real software engineering language). System Administrators don’t call it a Swiss Army Chain-saw for no reason.

As a computer language purist, I really want to hate Perl. It has weak typing, no data structures, it’s proud of the fact that there’s more than one way to do everything and the syntax just looks plain ugly. But this is not a pure or perfect universe. These very ‘flaws’ make Perl the ideal tool for the jobs it was designed for.

I may not actually like it, but I use Perl for just about all my hacking activities. It’s just too useful to ignore.