
Perl Basics: Becoming a Perl Newbie Extraordinaire
By Lisa Hui
Our goal in this piece is to introduce you to Perl variables, arrays, hashes,
subroutines, and typeglobs, the crash-course way :) Why don't I sound like
I want to go in depth with this? Simply because this will be a breeze for
you, especially if you're familiar with another structured programming language like C++. Even
if you aren't either of those cases, you could probably go through this cold.
Why not take a shot at it?
Variables
Perl variables can be declared anywhere in the code. In fact, you can create
new variables ("on the fly") as you are writing them into the routines, unlike
C++ where they must be declared at the beginning of the function. Scalar
variables are usually identified by the $ sign before their names, and these
names can be almost anything. For example, $name is a variable. So
is $TheOldLadyCrossedTheStreet. (Capitalization counts!)
Variables can be strings, integers, decimals, characters, objects and output,
without any need to cast (enumerate the type of data they will store)
them. It is also considered a scalar, which means "one-dimensional" (containing
only one piece of data).
A string:
$sentence = "This is a sentence.";
A character:
$letter = c;
A number or a "double":
$number = 2; or $number = 2.03;
Storing the output of a literal call (command):
$output = 'pwd';
#pwd is a system call that returns the full path of the directory to where
the script is
Storing the numerical/boolean value returned by a function
execution:
$isOpened = open(read, "file.txt");
Taking things literally:
$money = 'I want to make at least $2000 a week.';
#*wishes really hard*
Assigning the value of one variable to another:
$sentence = $number;
In the last example, I'm setting what was previously a string variable equal
to an integer or double (floating point number - a number with decimal places).
Perl will allow this because it will clear whatever data is in variable $sentence
and substitute the data in $number into it.
If a variable is first "declared" or called in a function evaluation (trying
to access the data in a previously nonexistant variable), it will evaluate
to 0, false, or output nothing at all.
Perl's use of variables may seem sloppy to core programmers, but this type
of "flexibility" in design shows that the structure of perl and how it interprets
data is different. Perl variables differentiate between plural and singular.
As O'Reilly's "Bible of Perl" so eloquently states it, "Strings and numbers
are singular pieces of data [scalars], while lists of strings or numbers
are plural."
Perl even has its own way of dealing with the "scope" of variables. By default,
all variables are global (can be accessed by any subroutine (function) in
the entire script. To "localize" it in a single subroutine, you have to declare
it with my, which might look something like my $variable; In
this case, this subroutine makes it sound very possessive. (It's mine
I tell you!) That's the idea now :)
Instead of using my, you can use local, which sounds more technical
and to the point! By shorter ones work for me :)
Arrays
Arrays are a collection of variables. In a sense, they create a numbered
list of values. Arrays are usually identified by a @ sign before their
names. @name would be able to hold several names by placing them in numbered
slots: $name[0],$name[1],$name[2] etc.
How would you create these? You could manually assign them like:
$name[0] = "Barbie Dahl";
$name[1] = "Curious George";
Notice that the starting index (the number of the slot) is 0, not 1! Or you
could let the array index the values you give it by itself:
@name = ("Barbie Dahl", "Curious George");
Notice that you can access any value in the array called @array (where
the italicized name of the array can be almost anything) simply by using
$array[index], where index is the number of the slot.
Maybe you want to know how many elements are stored in the array? To find
out now many elements are in @array, we can take advantage of the
fact that $#array will return the index value (an integer) of the
last filled element in the array ($array[$#array] will return
the last value stored in @array). Since array indices start at 0 and
not 1, the last index value will be one less than the number of elements
in the array. Just add 1 to this value and you've got what you need :)
Hashes
Hashes are a lot like arrays, except that their index values aren't numbers,
but keys so they're often called "unordered arrays" (seems to fit
the description, doesn't it?). They usually are identified by a %
sign before their names. How does this work? An individual hash element,
would look like $name{key}, where the key refers to the right element.
How would you create these? Similar to arrays, you can assign each element
manually by designating the key, and the value that it should be a reference
to:
$days{'Sun'} = "Sunday";
$days{'Mon'} = "Monday";
$days{'Tues'} = "Tuesday";
$days{'Wed'} = "Wednesday";
$days{'Thurs'} = "Thursday";
$days{'Fri'} = "Friday";
$days{'Sat'} = "Saturday";
However, you can set the keys (in our example were abbrevations for the days
of the week) and their values all at once:
%days = (
"Sun" => "Sunday",
"Mon" => "Monday",
"Tues" => "Tuesday",
"Wed" => "Wednesday",
"Thurs" => "Thursday",
"Fri" => "Friday",
"Sat" => "Sat",
);
And just for good measure, you're allowed to have a comma after the last
days key-and-value pair, even though it is not necessary.
What if you wanted to use a scalar variable as a key? That's possible. Let's
see this example to make sure of the syntax (don't use literal quotes unless
you have to!):
$day = "Mon";
$days{$day}= "Monday";
And remember, the key to each hash element is almost anything you
want it to be, not necessarily an abbreviation of the entry itself.
Splitting
This would be an introduction to the first built-in routine, also known as
an operator, you should be aquainted with if you're interested in
learning how to add the interactivity element to your CGI scripts.
It does what it says it does: split data. The simplicity of this is in the
fact that you merely need to give it the expression that you need to split
and what criterion you need it to split this data by. The split operator
will then return all the values that have been split. This is why you might
very well want to store the results of the split in variables, or even an
array.
@results = split(/what-to-split-by/, expression);
Let's take an example and see what the results are:
@values = split(/&/, "Ben&Jerrys");
If you check the array now, you will realized that it has basically done
the following:
$values[0] = "Ben";
$values[1] = "Jerrys";
The expression can also be a variable (which hopefully contains a value!):
$phrase = "Cat&Mouse";
@values = split(/&/, $phrase);
And you would end up with having accomplished basically doing the
following:
$values[0] = "Cat";
$values[1] = "Mouse";
Some subtle mistakes that can occur are when the "what-to-split-by" value
is not a literal character. This would be the case if "what-to-split-by"
is a metacharcter (see Simple CGI
for further reference). For example: @values = split(/|/, $phrase); would
give you a parse error (probably a 500 Server Error message) because ( |
) is a metacharacter. This is another situation in which backslash comes
in handy (or you can choose to use literal quotes: ' ' ) The revised example
which would not give you an error: @values = split(/\|/, $phrase);
What if you want to set all the data in an array to a different variable
name:
$first = $values[0];
$second = $values[1];
Here's how you can do this in one line (especially handy if you have more
than 2 values stored in the array). Since we know @values has only 2 entries
in the array, we can set this like so:
($first,$second) = @values;
Subroutine
A subroutine is the Perl equivalent of a C/C++ function, however, there are
some differences (as you've probably guessed) in the way these subroutines
are called and how they are defined.
You create subroutines by declaring them like so:
sub name_of_function {
#routines going in here
}
To execute (invoke within other subroutines or just the main stream of actions
being performed) them, we call the subroutine by name with an ampersand &
before the name. To execute name_of_function, we would write:
&name_of_function;
How about passing parameters between functions?
Call it using &name_of_function(@parameters); In this case, the
ampersand is optional. Why? You can call a subroutine by just invoking its
name (without the ampersand '&')! Then why did I just tell you to use
&name_of_function; ? This is because simply calling
&name_of_function passes variables in the default array @_
, however, it is perfectly ok to specify what list of parameters you want
to pass in the call to the subroutine.
Moreover, you can invoke a subroutine with parameters directly, which end
up being stored in the default (and local) array @_ . For example:
&name_of_function("parameter1","parameter2");
The line above would store "parameter1" and "parameter2" in @_ , which is
the array you can access to retrieve the parameters once you write the sub
name_of_function {} routine.
Can you prototype (declare that the subroutine exists and then define
them later in the script) them?
Yes, in its most basic form, a function will take on a C++ like syntax when
creating a prototype. Let's figure out what we want the prototyped function
to accept as parameters (or none at all) and then just declare it: sub
name_of_function(); would be equivalent to a regular &name of function;
call.
A sub name_of_function($); call would prototype this function to accept
one scalar variable as a parameter (which doesn't necessarily have to
be in parentheses like the prototype!). Some other popular declarations are
listed below:
| Declaration |
In Use |
| sub name_of_function($$); |
name_of_function($first, $second); |
| sub name_of_function(@); |
name_of_function($first,$second,$third,$fourth); |
| sub name_of_function(\@); |
name_of_function(@myArray); |
| sub name_of_function(\%); |
name_of_function(%myHash); |
Are you a bit confused by this? Maybe these example groups below may help
you get the hang of all these rules I'm jamming in here:
sub function($first,$second); # prototyped
&function(1,2); # passes these two numbers as scalar arguments
function(1,2); # the same thing
sub function(); # prototyped
&function(); # passes nothing (a NULL list)
function(); # passes nothing
function; # the same but if not prototyped, will return an error
# shouldn't require prototyping
&function; # gets arguments from @_
function(@_);
Will calling a function in the script before I declare/define it give
me an error?
Nope, Perl will find the function as long as it is defined somewhere in the
script (or in an included file) because the script is "compiled" and
tested (sketchily) for veracity before it actually executes on the server.
The error messages you will get will most frequently occur when there's
some kind of syntax error (well...most likely).
Subroutines can also be defined in other files and "included" in the script
using do, require or use commands (operators that work
like the #include call in C and C++, except these can be called anywhere
in the script, not only at the beginning). They basically take the code in
the file and copy it into the script during run-time (when it is called
through the browser in our case). We will discuss more about including files,
reading and writing to them and more in
Perl Basics: Reading, Writing,
and Taking a Bite. As for eval, which can create subroutines
"on-the-fly," we'll get to that when we get there (trying to follow a "less
confusion, the better" policy here) :)
Defaults
We've already mentioned the default array @_ which is a persistent value
in subroutines. You can easily access them through $_[index] like
any other array. As well, sometimes it's just preferrable not to have to
name a variable and use $_ (which is not the same as @_ indexed values).
When you are not explicitly setting values, often you will find out that
Perl will take care of the dirty work for you and set them to default variables
and arrays for you.
For example, if you had not specified the expression in the split function,
it would have automatically tried to access a default $_ value. Where else
will this be tried?
-
list functions like print and unlink
-
pattern matching operators when used without a =~ operator
-
the scalar variable in a foreach loop like: foreach
$scalar(@list) {}
$! is the scalar variable that stores error messages (like when you
fail to open a file for whatever reason - including if the file does not
exist).
As well, which we will see in our Form CGI Case
Study, there are default filehandles and "buffer" values when getting
data through a form that is placed on a HTML-coded page. They probably have
an unofficial denomination as a "global special *." Below is a list of some
of the most common ones:
| Variable |
Use |
| $_ or $ARG |
Default scalar input variable and for pattern matching |
| $! or $ERRNO |
Stores the value of the errno variable (which deals with the last system
call error). |
| $[ |
The first character in a string or the index of the first element in
an array (0) |
| $] or $PERL_VERSION |
Stores the value of the version of the perl interpreter installed on
the server + patchlevel / 1000 |
| $^T or $BASETIME |
The time when the script began running (seconds since 1970 on UNIX servers) |
| $^O or $OSNAME |
Name of the operating system the Perl binary (perl files) were compiled
for. |
| $ARGV |
The name of the current file read from <ARGV> |
| $0 or $PROGRAM_NAME |
The name of the perl script being executed |
| Lists |
Use |
| @INC or %INC |
Contains the list of places to look for Perl scripts to use with
do, require or use. The hash containes filename of each
file that has been included into the script through do or
require (the key is the filename specified and the value
is its location) |
| @ARGV |
Contains command line arguments for the script (if you're invoking it
through a telnet prompt) |
| %ENV |
Environment Variables |
| Filehandles |
Use |
| ARGV |
Iterates over command line given filenames in @ARGV (can be written as
a NULL filehandle: <>) |
| STDERR |
Standard Error |
| STDIN |
Standard Input |
| STDOUT |
Standard Output |

Perl Case Study - Environment CGI