All posts by Stephen Darlington

Free Software HOWTO

v1.2, 17 January 2001

With the current Linux trend towards multi-million dollar IPO’s and “Open Source” software, much of the emphasis of “free” software has been lost leaving people new to the fold confused and not completely understanding all the implications. This HOWTO will, hopefully, reduce some of that confusion.

Introduction

What’s in here?

This document talks in non-technical terms about free software, what it means and why you should care about more than just the cost of your software.

Who is this HOWTO for?

People that have been hacking for years will already be fully au fait with the content of this document. Or at least you should be!

The Free Software HOWTO is aimed at people new to Linux, Open Source or free software.

New versions of this document

The official home of this HOWTO is here. You will always find the most up to date version here.

Disclaimer

You get what you pay for. I offer no warranty of any kind, implied or otherwise. I’ll help you where I can but legally you’re on your own.

Credits and Thanks

I welcome any constructive feedback on this HOWTO and any general software licencing issues, although my opinions are just that: a subjective view. You should understand what each licence means before committing to one for your own software or documentation..

Licence

This document is copyright 2000, 2001 Stephen Darlington. You may use, disseminate and reproduce it freely, provided you:

  • Do not omit or alter this copyright notice.
  • Do not omit or alter the version number and date.
  • Do not omit or alter the document’s pointer to the current WWW version.
  • Clearly mark any condensed, altered or versions as such.

These restrictions are intended to protect potential readers from stale or mangled versions. If you think you have a good case for an exception, ask me.

(This copyright notice has been lifted from Eric Raymond’s Distribution HOWTO.)

Overview

Introduction

Until now, you’ve probably never given much thought to software licences. No-one can blame you. They’re usually pages and pages of legalese telling you what you can and can’t do with the CD you just bought. If you actually sat down and read it all, you would probably never agree to it!

Licences, however, are at the heart of what free or open-source software is all about. Let’s take a look at a number of very broad categories of licence.

Commercial software

The first type is the commercial licence, like the one that Microsoft or Lotus might insist you agree with before using their software. The basic premise is that you don’t own the software, you have an agreement with the author that allows you to use it within certain guidelines. As the copyright owner, they can impose whatever restrictions they want. Common conditions include limits on the number of concurrent users, number of copies, and what you can use it for (for example, “non-commercial use” or a ban on reverse engineering).

The emphasis is what you can’t do, and all the power is in their court!

One thing to note, and this will become more relevant later on, is that none of this relates to the cost of the software (or, more properly, the licence). You can get commercial software, such as Microsoft Internet Explorer, for no cost and still be forced to abide by the publishers conditions.

Shareware

The next kind of software, shareware, is sometimes called free, although as we shall see, that’s not correct.

Shareware was started in the early eighties by Jim Button when a few of his friends started passing round some software he’d written and he started getting phone calls asking for support. Eventually he added a request for $10 to the startup screen and shareware was born.

In short, shareware authors allow you to try out their software for free but request payment for continued use. Many of the same restrictions as for commercial software remain, including the limitations on reverse engineering, concurrent use of the full version, etc, stand. Additionally, the “free” downloads are often broken in some way, perhaps limited functionality or splash screens.

Interestingly, Linus Torvalds describes shareware as combining “the worst of commercial software (no source) with the worst of free software (no finishing touches).”

Public Domain software

All the licences we’ve seen until now have been designed to reduce people’s ability to do what they want with the software.

At the opposite extreme is public domain software. This is software that has no copyright and, therefore, no restrictions on its use. You can copy it onto as many machines as you like, reverse engineer it, make modifications and distribute it as you feel fit.

This is the first kind of software that we’re come across that can rightly be called “free.”

Normally, whenever you write something you automatically own the copyright, even if you don’t add an explicit copyright message. For public domain software, the author throws away these rights allowing everybody to do what they like with the software.

Unfortunately, “anything” also includes selling it. Imagine spending a huge amount of time producing your masterpiece, giving it away and then finding that someone else was able to sell it and make their fortune with all your hard work! Worse, people don’t even have to give you credit for your work; they can legally take it, replace their name with yours and distribute it.

This might be why there isn’t a huge amount on public domain software.

At this point it might be worth looking back at free commercial software. Both public domain software and a free piece of commercial software cost the same, but the freedom you’re given to use it varies. A common phrase you hear with free software is “think free speech not free beer.” The difference between public domain and commercial software show the opposite extremes.

Free software

By now it should be clear that there are many different kinds of free software and not all are equal. The version of free that this section relates to is the one promoted by Richard Stallman of the Free Software Foundation.

But first, a little history.

The Free Software Foundation was formed in 1984 when a printer manufacturer refused to give Richard Stallman the source code (the computer instructions required to make a program). It had been leading up to this for some time — the increasing number of non-disclosure agreements, new software that banned sharing of information, etc. — and the printer manufacturer was just the straw that broke the camels back.

Stallman decided that he could not, in clear conscience, sign a non-disclosure agreement or work with a company that restricted his ability to share information. While most people would have given up at this point and gone to work, for obscene amounts of money, for a big company writing proprietary software, Stallman stuck by his principles and decided to make his own operating system, free of the constraints of commercial systems.

The process leading up to this is documented in more detail in Steven Levy’s excellent ‘Hackers.’ You’ll note that Levy calls Stallman ‘the last real hacker.’ Happily he was wrong, otherwise you wouldn’t be reading this!

As we can tell from the background, “free,” in this context, relates not to the initial cost but to freedom. Stallman was unwilling to surrender the right to make modifications or improvements to any software he used — and to do this you need the source code.

This may sound just like public domain software up to this point. The difference is that there are clauses in the licence that attempt to keep the software free no matter what changes are made. The most famous free software, Linux, uses the most famous free software licence, the GNU General Public Licence. It is sometimes also called Copyleft, as it very cleverly uses the current copyright laws to do the exact opposite of its original mandate.

The way it does this is by insisting that the code and anything derived from it is also released with the GPL licence. In some senses it is ‘viral’ in nature and it is this that is central to many people’s objections.

Also, it’s worth noting that the word ‘derived’ is a little too vague. Does a library linked to a GPL’d program need to be GPL’d also? Does a program running on a free operating system need to be GPL’d?

There’s no clear, obvious answer for either of these with the current version of the GPL. The new version (3) is intended to fix some of these shortcomings, but it’s viral nature will remain.

Open Source software

Open Source software is, in many ways, exactly the same as free software, despite what Richard Stallman says!

It was started in 1997 by Eric Raymond and Bruce Perens as a response to the increasing confusion over the use of the word “free” in relation to software. (Confusion that has continued, or I wouldn’t be writing this document!)

In essence, Open Source is a marketing or PR exercise to make free software more acceptable, more understood by the general public and the big corporates that, until that point, were comforted by the money they had to pay to get commercial software.

Like “free” software, the “Open Source” trade-mark does not mandate a single licence. (Freedom of choice is important, even when it comes to giving away software!) Technically any licence that meets all the requirements in the Open Source Definition can be termed Open Source. These, in summary, are no restrictions on the use of the software, access to the source code and the freedom to make modifications and distribute them.

The two most famous are the Berkeley System Distribution (BSD) Licence, which allows distribution without the source code, and the GNU General Public Licence, although there are many more.

The Sun Community Source Licence, for example, is not compliant because Sun Microsystems demand a fee for any commercial distribution and insists that derivations are still compatible in some arbitrary ways. (This is not intended to single out the SCSL as being particularly bad. However, the fact that it purports to be “open” when it isn’t is a disturbing trend.)

The implications

Overview

Free software has already had a significant effect on the computer industry. Free software is behind most of the critical parts of the Internet, it was used (unsuccessfully) as part of the defense in the Microsoft anti-trust trial and the spate of recent IPO’s has shown that there’s money too.

Unless you bought some shares, this all appears to be affecting other people. There is an impact, however.

I just use software, how does this all affect me?

It’s tempting to think that, if you’re not a ‘techie’ or a hacker, the difference between free commercial, public domain and Open Source software is minimal. But that’s not true, even as a user the difference affects you because the development of the software is affected.

I’ll outline a plausible scenario as an example.

Company X designs and releases a fantastic piece of software. It’s commercial software, but the publisher seems receptive to new ideas, indeed versions 1.1 and 2.0 are exactly what you were looking for and the upgrade costs were reasonable.

At this point any number of things can happen. Perhaps you find a bug in it, one critical to your business. But the publisher offers no warranty for the software and say they are not sure whether they will fix it. (Note that just because you paid for the software, you do not necessarily get better service or a guarantee of any kind. According to the licence, you can’t sue Company X if the software is not fit for the purpose it was sold for.)

Or maybe they go out of business. Or perhaps they start competing with your company and won’t sell you the next version. Or perhaps the features you want are not in the new version. There are a huge number of ‘if’s and ‘maybe’s, all outside your control.

Basically, if you use commercial, close-source software you are at the whim of the publisher. If they do something you don’t like, tough.

However, if you’d used Open Source software you’d have access to the source code and could fix or upgrade the program as you saw fit. And if you couldn’t do it, there are many programmers willing to support the new versions or you could hire someone to do it for you. In summary, you have much more control over future development of the product.

You’ll note that none of the advantages here are strictly related to cost. I think that’s something that people tend to focus on too much, possible due to the “free software” title. However, there are still advantages.

But first we need to get away from the initial cost of the software as that’s normally a small percentage of the total cost. Instead, let’s think in terms of support costs. Once you’ve installed the software, what costs are there? An obvious cost is that of upgrades. Less obvious is lost productivity due to software failure and support charges from the manufacturer.

In the case of free software, there are no upgrade costs (other than the time and inconvenience of doing it, which also applies to commercial software). Free software is usually regarded as more reliable then commercial software — see Fuzz Revisited for more information. And the support charges are optional: you can either deal with it in-house or hire any one of a number of support organisations.

I’m a developer. How does this affect me?

As a developer, you’ll already know the benefits of being able to access the source code for a program. You can fix it, see how it works and integrate bits of it into your new program. (Using parts of another program, however, isn’t quite that simple and is dependent on the licence of the original software.)

A common problem developers see is the loss of their livelihood. If everyone gives away their software, how can anyone make money? It’s a fair question and there’s no single, correct answer.

Perhaps the most common answer is that most software is developed in-house and is not distributed. None of this development will be affected, so if you have that kind of job you can rely on your salary for a while yet!

If you work for a consultancy, almost all the revenue comes not from selling software but from “professional services,” i.e., they charge for developing the software rather than the licence to use it. Again your job is safe.

Then there’s shrink-wrapped software. The Free Software Foundation would say that it should be free. So if you listen to them and you work for Microsoft your job is in danger unless they diversify into services.

The Open Source people would say that there’s nothing wrong with shrink-wrapped software, but point out that there are advantages in releasing the source.

As you can probably see, the risk to your jobs is small and there are many benefits. At least that’s the way I see it and I work writing software!

Where can I read more?

The most obvious place to read more is at the GNU website, after all they started it all.

But there are alternatives. Other important web-sites are as follows:

  • Open Source. This is the ‘official’ Open Source page. There’s lots of interesting stuff here, including a more detailed discussion on the effect of free software on different people Microsoft’s Halloween documents, their unofficial response to the Linux threat.
  • Eric Raymond’s web-site. Eric has written much about Open Source software, with much more depth and style than I can muster!
  • Slashdot. There aren’t more ‘HOWTO’s here, but Slashdot is a community of people interested in Open Source software. The discussions sometimes get childish, but you can learn a lot!

There are also a few books published (at least partially) on the subject:

  • OpenSources. Voices from the Open Source Revolution.. This book talks about all aspects of the Open Source community, including licencing. The main reference is Bruce Perens essay.
  • The Cathedral and the Bazaar. Again, this book is about general Open Source issues, but it includes a discussion on some of the non-obvious implications of the licences, particularly the reason why just because you can, you don’t frequently get many versions of a piece of software.
  • Hackers Steven Levy talks about the early days of the hacker community, including a good piece on Richard Stallman.

A matter of style

Introduction

The open source community does a lot of things right. The internals of a program is one of them. The people who write code do so because they are proud of what they do and want the respect of other people in the community. The beauty of the code is a very important aspect in this acceptance.

The same isn’t necessarily true in the commercial world. Time-scales are much more important than how the guts of a computer system looks and it’s generally not good to be seen spending a bit of time making your code look pretty.

This column is not going to say that style is more important than substance, but that this interest is something that can bring a large increase in productivity.

Indentation

I know people that hate me for this, but something that really bugs me about some code is the indentation style. I can handle a poorly thought-out, hacked algorithm if there’s a good reason (time usually), but I can never see a good excuse for badly formatted code.

Initially I thought I was being anal, just getting very hung-up on a not-very-important detail, but then I started noticing things. Usually the people that produced the worst formatted code also included the largest number of defects and, although they initially appeared to write code more quickly than me, finished late much more frequently.

Why does that matter?

There are two fundamental reasons why the ‘look’ of the code matters:

  1. Comprehension. How quickly can people understand the code?
  2. Structure. Is it obvious how the program is split into units? Is it obvious what the next statement to be executed is?

I guess they are quite similar, but I feel that it’s important to separate them out. Hopefully you’ll see why in a minute.

Let’s talk about comprehension first. Here’s a snippet of code:

if (x < foo (y, z))
   haha = bar[4] + 5;
 else
   {
     while (z)
       {
         haha += foo (z, z);
         z--;
       }
     return ++x + bar ();
   }

If you've read the GNU coding standards, you might recognise this as an example of good formatting. It isn't.

But there are some redeeming factors. Firstly they've used two spaces for each level of indentation and not a tab. Studies have shown that while people prefer (aesthetically speaking) tabs in programs, those same people actually understand the code more effectively with between two and four spaces. Also, the formatting of the mathematical expressions is clear, with good use of whitespace.

Now the bad. My main problem is with the use of the braces. The lesser crime is immediately after the if condition where they haven't used braces at all. That's bad because a novice programmer might add an extra line below the 'haha' assignment. It probably won't be a valid program any more, but it looks okay at first glance. Since maintenance is the largest part of the software life-cycle, anything that makes updating code more difficult is bad news.

Structure

The indentation of the second half is dreadful, though. Why have two levels of indentation to to indicate one block of code? Again, there are studies that show that formatting like this actually reduces the readability of the code.

My preferred method of formatting the same code would be:

if (x < foo (y, z)) {
   haha = bar[4] + 5;
}
else {
   while (z) {
     haha += foo (z, z);
     z--;
   }
   x++;
   return x + bar ();
}

The main difference is the formatting, but I've also altered the 'return' statement at the end. Side-effects (like using '++x' in an expression) seriously reduce readability as it's much more complex to figure out what it's trying to evaluate. And the solution only takes an extra line...

But why is this a better way to format the code? Simple: it shows the structure of the code more effectively. Since there are only three levels of indentation, your brain doesn't have to work as hard to figure out where, for example, the end of the "else" block is. (It's not so easy to see the advantage with ten lines of code. Try to imagine a page full of code with a large number of indentation levels.)

Some languages appear to be more tricky than the C example here, too. Where does the exception block in Java go? Should the procedure definitions in a package header be indented in PL/SQL?

If you don't really understand block-structure in modern programming languages and are just trying to follow the example of fellow developers, these are probably difficult questions. They're not supposed to be.

A computer will understand any syntactically valid program, but not necessarily in the same way that you or I would. This makes it vitally important that everyone involved has a common understanding of how the program is supposed to work. The first steps to doing this are structuring the code sensibly and making it easy for others to comprehend.

The last important comment is about timing: none of this takes any longer than building your code with poor formatting and structure. Why? Well, if you understand how it's structured it'll be more likely to work the first time. And if it doesn't, you should be able to find the buy more quickly because your code is easier to comprehend.

Summary

To summarise, the neatest, best formatted code takes no longer to write and is much easier to maintain than code that has bad 'style' (although it's much less likely to need fixing).

So the next time that you come across some dreadful, untidy code, try to make the person that wrote it start again. If they don't understand how their own code is structured, neither will anyone else.

You'll note that I keep mentioning studies, but don't reference the source. That's because I've read all about it in Code Complete (follow the link to by it from Amazon.com) and not from the horses mouth. I recommend you also read it for the full story.

Death March

Introduction

Perhaps more than any other engineering discipline (see Steve McConnell’s After The Gold Rush), software engineers work on projects that have no real chance of success. There are as many reasons why as there are projects, but if you want to be in with a chance of surviving such a ‘death march’ this could be the book for you.

Content

Edward Yourdon is a well known and well respected computer scientist, so what useful information can he give you in these circumstances? Surely you’re lumbered with the simple choice between putting up with it or resigning?

Well, no. The book explains that there are any number of things to do, and not just for the project stake-holders. There are things that just about anyone on the project — and indeed just outside the project — can do. And quitting is almost always one of the options he gives. I find this interesting because most books tend to argue that you can fix anything. Sometimes you just don’t have the authority to do anything that would make a significant enough change.

Of course, it’s a two-hundred page book, so it doesn’t just launch into this resign-or-fix discussion. First he talks about what a Death March project actually is, and then moves on to finding who the key players in the project are. These people are not always those that you think should be in charge! For example, the CEO’s golfing partner is often in a position of power and influence, although you won’t find them in the organisation chart. (I’ve seen these kind of dynamics in play, but I hadn’t really though about it in these terms.)

He then moves on to negotiating the best deal for you and your team in this bad situation. You may not be able to get your boss to accept a rational argument at the beginning (or even towards the end) of the project, but you should at least try. And these are the arguments to use!

Motivation, both from the various clients and in your own team, play an important role in the success or otherwise of the project, and are discussed in some detail. One vaguely controversial statement is that we all need to be involved in politics to some extent. I agree with the ‘why’ — even your boss may secretly want your project to fail — but I don’t know how. Many, maybe most, of the developers I know have absolutely no interest in politics and try to pretend that it doesn’t affect them!

The next two chapters talk about methodologies and tools, and their applicability to death march projects. The last chapter discusses integrating the death march into your companies culture (most of your projects are going to be like that anyway, so you may as well get used to it!).

Controvacy!

It’s not all good news, though. Some of the chapter on staff motivation is hard going (or at least would be for the people on your project). One of Yourdon’s correspondents suggests that, on a death march project, people should be putting in at least 60 hours a week! I know that some people do that, and that it is encouraged at some companies, but I really don’t think that people should be encouraged to do that on a regular basis. It’s only fair to say that Yourdon goes on to say that people working over 60 hours a week need to be watched closely, but by then the damage may already have been done.

Generally, however, the advice given is very pragmatic. I’d like to think that most of it was obvious, but it isn’t. This is the kind of information you probably realise only after years in the business.

Overall

I’m sure you can guess by now: I’m impressed. Most Computer Science books are not this sensible and are frequently based in research in university labs rather than commerce. In fact, I’m pretty certain that I’ve never seen a book that recommends that you resign in certain circumstances!

It’s not just the detail that makes this an important book. Yourdon backs up his assertions with examples and email’s from colleagues that discuss some of the options available.

If you work in IT, sooner or later you will end up working on a Death-March project. This book is just what you need to be able to tell what chance of success it has and whether you and your organisation will survive it. Highly recommended.

Details

ISBN: 0-13-014659-5

Price: $16.99

Buy this book from Amazon.com or Amazon.co.uk.

Installing Oracle 8i R2


Introduction

Everyone will be very pleased to hear that Oracle’s third attempt at producing a usable database product on Linux has largely been successful. The first two usually worked but only after much aggravation. Forget all the extras that 8.1.6 provides, you can get the thing installed with much less grief!

Of course, I wouldn’t go so far as to say that it was simple and straight-forward all the time. It is Oracle that we’re talking about here.

I’ll start by describing how I got Oracle installed on my box and finish off with some questions and answers, much in the same format as the HOWTO. It’s probably worth having a look at the HOWTO still as many of the problems are similar and the solutions given there may give you some idea of where to start looking.

My machine

First, some news on my ‘server’ configuration: I still have the same Celeron 466 with 128Mb of memory. On the software side I’ve upgraded to Mandrake 7.1 (if I’d been running a production Oracle server I wouldn’t have taken the risk). I didn’t remove my old installation of Oracle before starting on the new one and I didn’t attempt to perform an upgrade.

I did remove JRE (Blackdown 1.1.6v5) and my installation of JDK (1.2.2) from my path. Oracle now comes with its own JRE, so even having the risk of it using the wrong one made me paranoid.

The last thing to note is that this time I downloaded my copy rather than using a CD. This seems to be what most other people do, so my tale here should be closer to ‘real life.’

My successful install

The process was as follows:

  1. Download Oracle 8i R2
  2. Extract the archive
  3. Create the required users and groups
  4. Make sure X is set up correctly
  5. Start the installer
  6. Quick tests

Firstly, the download. It’s big, nearly 300Mb. Don’t attempt it without something like Gozilla or wget even if you’re on a fast corporate connection.

Secondly the extraction. You’ll find that it comes in a standard tar archive compressed with GNU Zip. This command should get all the files out:

tar zxvf oracle8161.tar.gz

When you extract it, remember that the files coming out are slightly bigger (301807K on my machine). So you need over 500Mb of disk space before you even start the installation!

Before you actually start the installation, you’ll need to switch to “root” for a couple of commands. Start by creating a group called “dba” and a user “oracle”. Your new user should be in the new group. Log in as your new “oracle” user and make sure your X Windows system is working properly. (If you can fire up a new ‘xterm’ you’re fine.) The Oracle installer, as before, works only in a graphical environment.

Go to where you extracted the software archive. You’ll find a directory has been created (“Oracle8iR2”). Move into it and you’re ready to start the installation!

(A quick note: in the same directory there’s “index.htm” which is the root page for all the Oracle installation document. This seems to be improved over earlier releases and is worth a read.)

Type:

./runInstaller

A splash screen should appear, followed by a Windows-style Wizard/Installer. I find the default options for almost everything to be fine. Broadly speaking, and assuming some common-sense is used, just clicking “Next” continually should result in a working installation. In slightly more detail…

(Note that there are a few points where the installer asks you to log in as “root” to run some shell scripts. To simplify the text below, I’ve missed these steps out. Simply do as it says and click “Retry” once it’s done.)

  1. Welcome screen. Click “Next.”
  2. File Locations screen. The top box should be correct; it displays the location of the archive containing all the software about to be installed. The second box is the “base” of you Oracle installation. I chose “/home/ora816” but this is not recommended. Have a read of the OFA (Oracle Flexible Architecture) documentation.
  3. Available Products screen. If you’re installing a server, select “Oracle 8i Enterprise Edition 8.1.6.1.0”; otherwise select “Oracle 8i Client 8.1.6.1.0”. I’m assuming that you’re building a server and click the first option.
  4. Installation Types screen. Do you want a “Typical”, “Minimal” or “Custom” installation. Unless you really know what you’re doing, pick “Typical”.
  5. Upgrading or Migrating an Existing Database screen. If you have a previous installation, Oracle will ask whether you want to upgrade your database to the new 8.1.6 format. I didn’t. I’d recommend doing this yourself once the installation process is complete even if you do.
  6. Database Identification screen. Here Oracle asks you for a Global Database Name and a SID. As before, this is something your DBA probably has an opinion on. If you’re the DBA and you don’t know what it’s asking for, enter “dev1” for both.
  7. Database File Location screen. Now Oracle knows what you want to call your database, it asks you where you want to put all the files that make up the database. Think back to your reading of the OFA documentation for this.
  8. Summary screen. Oracle now tells you what it’s planning on installing. Click “Install” if you’re sure, or go to the “Previous” screen an juggle the options around.
  9. Configuration Tools. First Oracle runs the Net8 Configuration Assistant and then runs the Database Configuration Assistant. Basically, it sets up your networking and creates the database you asked for. No user intervention is required. (Note: the SYS account password is “change_on_install” and the SYSTEM password is “manager”. You should change both using the SQL*Plus “password” command as soon as possible.)
  10. End of Installation. That’s it, you have a complete installation!

If you want to install Oracle Programmer (Pro*C, etc), you need to follow the same process as before: go back through the installation process, but this time following the “Oracle Client” route. The rest of the process is similar to the above and very straight-forward. The new installer even asks you if you want to start again once your database has been created.

And if you want to set up a network connection to another machine, the process is exactly the same as for Oracle 8i (and is covered in the main HOWTO).

Questions and answers

Java problems?

As before, many of the problems come from your choice of Java Virtual Machine. R2 actually comes with a runtime environment this time (JRE 1.1.8), which does make things easier. However I have heard reports that older versions sometimes work better. The older version is normally Blackdown’s 1.1.6v5 release, the same Oracle used to recommend with their 8.1.5 release.

Memory requirements

One thing that is exactly the same is the amount of memory required. I don’t remember seeing a figure in their old documentation, but they say you need 400Mb this time, either real or virtual, for 8.1.6. I have 384Mb in total on my machine and it was fine. The default database configuration seems to use more memory but, as always, you can change that.

Running Redhat 7 or another glibc 2.2 based distribution

Short answer: add “export LD_ASSUME_KERNEL=2.2.5” to your profile and then type “. /usr/i386-glib21-linux/bin/i386-glibc21-linux-env.sh”.

Long answer: look at my page on the subject.

Perl

Introduction

Many developers would hate to have their master-work described as a mess, but not Larry Wall, creator of Perl and celebrity hacker. The way he sees it, the language is a mess because the problem domain — real life — is also a mess. He has a point.

I first came across Perl a few years ago when I was writing a program that required a certain amount of ‘screen-scraping’ from a telnet session, the ability to retrieve files using FTP, some complex processing and interaction with an Oracle database. This is a fairly messy problem, and one that Perl looked eminently able to solve.

Originally I came up with a design involving precarious shell-script creations, plus PL/SQL and a pile of other logic thrown in for good measure. This is even more messy and, worse, I can’t see how it would ever have worked. Then, at various times, people would suggest Perl. It’s good at text handling one would say. You can connect it to a database another would say. But it was the telnet library that sold it to me. I still hadn’t figured out a reliable way of doing that in a shell script.

The philosophy

With the tools that I already had, the problem may well have been impossible. I could have done it in C, but I didn’t have the luxury of time. I might have figured out a way of doing it with shell scripts, but it’d be a nightmare to debug and support.

It sure looked possible in Perl, and it had to be easier than doing it in C.

The Perl philosophy, it turns out, is make the every day things easy and everything else possible. The kind of thing that I wanted to do wasn’t exactly typical Perl fare, but it had all the right elements.

Much of the time I use Perl like a super-shell script. Thinking of it as such is not completely wrong, but it does do the software a great disservice. Sure, you can use it like that. But you can also make ‘real’ programs, complete with declared variables, objects and a GUI user interface.

You can do a lot in Perl, and it doesn’t try to cramp your style. Want to use objects? Go ahead! Think they look too complex? You don’t have to use them! You can write programs like glorified shell-scripts, or like C. Roll your own, use a library, the choice is yours.

This neatly brings me to another one of Perl’s philosophies. (The Perl definition of a philosophy is as messy as the language and the problem you end up solving with it.) This other message is: There Is More Than One Way To Do It.

Let’s try a trivial example: the if statement. Most languages insist you do something like this:

if ($a == 5) {
  print "a is five!n";
}

Indeed, this is perfectly valid Perl. But so is this:

print "a is five!" if ($a == 5);

I don’t necessarily think that this represents good programming style, but then I don’t have to use it. My choice.

The syntax

As you can see in the above example, Perl bears more than a passing resemblance to C. It uses the double equals (“==”) to compare numbers and semi-colons as statement terminators.

It’s also a bit like Unix shell scripts. Note the use of the dollar to identify variables. However, Perl is much more insistent on their use than Bourne. You must use the dollar all the time, even in assignments. This gets annoying if you change between languages with any degree of regularity.

The good news is that Perl doesn’t always use a dollar sign to identify variables. The bad news is that it also uses the at sign (@), the percent sign (%) and the ampersand (&). (At least these are the common ones. There was a joke going round recently suggesting that Perl 5.6 only supported Unicode character because they’d run out of symbols on the normal keyboard. At least I hope it was a joke.)

Fortunately it’s not completely indiscriminate. It turns out that Perl has only three data types: scalars, lists and hashes. The dollar identifies scalars, variables that accept a single value, at’s are used when you want to put several values in a single variable and the percent sign is used to identify associative arrays (which Perl calls hashes).

For example:

# Scalars
$a = "hello";
$b = 1.2;
# Lists
@c = ("hello", "goodbye");
@d = ($a, $b);
# Hashes
%e = ( "var1" => "value 1", "var2" => "value 2" );
# Output
print "$a, $c[0], ", $e{var1}, "n";

There are several interesting things to note here. Firstly, you don’t have to declare your variables (although you can if you want). Secondly, scalars store any single value, whether number, character or string. They are weakly typed in the same way that Variants are in Visual Basic; the system knows what’s there and will do different things based on that information. We’ll see more of this later.

Lists can store any number of scalars (which, as you can see, don’t all have to be what most other languages would consider the same type), with Perl performing all the memory allocation and deallocation, much the same way as the much-vaunted Java garbage collector. Many of the same properties apply to hashes.

Perhaps the final interesting property is the way you read from the non-scalar types. As you can see, you must use the dollar — scalar — to access them. This does make sense when you stop to think about it: you’re reading a single value not the complete list (which is still represented using the at sign).

The Perl Difference

All this variable stuff is unusual, but it doesn’t make it stand head and shoulders above everything else (in fact, the weak-ish typing and lack of user-defined types make it much worse than many others). But Perl is used in just about every CGI script in existence for a very good reason.

Is it the same reason that Perl has a reputation for being unreadable. Generally Perl is no less readable than, say, C but the one aspect that confuses just about everyone the first time are regular expressions. If you like code that looks like line-noise, this is the feature for you!

Regular Expressions are a way of representing patterns. For example this line could represent UK postal codes:

^[A-Z]+[0-9]+ [0-9][A-Z]+$

Or in English: one or more letters at the beginning of the line, followed by some digits. Then there’s a space followed by a single digit and one or more letters. The pedantic might note that all the letters must be uppercase.

Like all the usual Unix command-line tools, Perl allows you to look for and manipulate patterns in files. Perl extends the usual array of tokens allowing you to make fantastical expressions that quickly become completely unreadable. The power comes from the fact that all this is embedded right into the language, no clumsy function calls.

while ($line = <>) {
  chomp $line;
  if ($line =~ /1234/) {
    print ":$linen";
  }
}

Note here that in the third line I compare the input (“<>” reads the next line from the currently open stream, which is stdin by default) to an expression. Clearly this is a very simple example, but you should be able to see that having this functionality builtin gives the language the ability to express some very complex ideas concisely.

Extras

One of the things that Perl does better than just about any other language is plug-in modules. Perl 5 added some clunky object-oriented-like features and, while they may not be elegant, they do seem to work.

A testament to their power is the number of modules available for free download at CPAN (the Comprehensive Perl Archive Network). There are modules for connecting to just about any relational database, libraries to talk to all the Internet protocols I could think of and code to deal with XML and configuration files (or XML configuration files). There are so many modules you rarely have to write much in the way of code yourself.

Summary

I’ve barely begun to scratch the surface of Perl here. There’s much more to it, but the beauty of the language is that you don’t actually need to know that.

The language itself has a tendancy to look ugly and, often, unreadable. But it’s used just about everywhere you can find a Unix box. There are few other languages that come close to Perl for hacking text files and automating boring system administration tasks. It takes over where shell scripts leave off and only starts to run out of steam for projects thousands of lines long (it’s technically able to cope with more, but it’s not a real software engineering language). System Administrators don’t call it a Swiss Army Chain-saw for no reason.

As a computer language purist, I really want to hate Perl. It has weak typing, no data structures, it’s proud of the fact that there’s more than one way to do everything and the syntax just looks plain ugly. But this is not a pure or perfect universe. These very ‘flaws’ make Perl the ideal tool for the jobs it was designed for.

I may not actually like it, but I use Perl for just about all my hacking activities. It’s just too useful to ignore.