Programming Languages at a Glance
Prev	Chapter 16. Perl	Next

16.2. Quick Tour

16.2.1. Expressions and Context

The hello world example is identical to Python's version apart from the semicolon ending the print statement.

print "Hello World";
-> Hello World

Strings can be delimited in multiple ways, each way causing the string to be interpreted differently. Using double quotes, for example, Perl interpretes C's escape sequences (and also interpolates variables as we will see later on). Using single quotes, the string is interpreted literally.

print "Hello\tWorld\n";
print 'Hello\tWorld\n';

-> Hello   World
Hello\tWorld\n

In contrast to Python, print is a function. Function calls look like in most procedural languages. However, the parentheses around the arguments can be omitted. The following statements are equivalent.

print 123, "blah", 5.6;
print(123, "blah", 5.6);

-> 123blah5.6123blah5.6

By default, the print function does not separate the printed fields and records (that is, multiple calls to the print function). But we can change this behavior by setting the special field and record separator variables. To keep our examples short, we emulate Python's print statement by setting the field separator to a space and the record separator to a newline.

$, = " ";
$\ = "\n";
print 123, "blah", 5.6;
print(123, "blah", 5.6);

-> 123 blah 5.6
123 blah 5.6

Besides the plain print function, Perl also supports C's formatted printing to streams and strings.

Basic arithmetic works as expected as well. The meaning of the operators is derived from C (and awk).

print 3 + 4*5 + 2**3;
-> 31

The next examples give us a first idea of Perl's context sensitivity.

print 10 + "10";
print 10 . "10";
print 10/3;
print 5.5 % 3;
print "ab" x 3;
print 10 x 5.5;
-> 20
1010
3.33333333333333
2
ababab
1010101010

Perl interpretes every statement in its context and freely converts between types as required. The first example adds the String "10" to the number 10. Since addition is defined for numbers only, the string is implicitly converted to a number. For string concatenation (the dot operator), the opposite happens, and the number is converted to a string. The third example shows that the division operator is using floating point arithmetic (unless you switch to integer mode using use integer;). On the other hand, the remainder operator % is an integer operator which causes the floating point number 5.5 to be converted to an integer before computing the remainder (in Python the same expression would return 1.5). The letter x denotes the string replication operator which takes the string on the left hand side and the number of times the string should be repeated on the right hand side. Because of this, the number 10 is actually interpreted as a string and the string "5.5" as the integer 5.

As a consequence of all these implicit type conversions, more operators are required to avoid ambiguities. For example, Perl can not overload the plus operator to also mean string concatentation, since the expression 10+"10" could be either 20 or "1010". Similarly, the string replication must use a new operator and insist on the correct order of arguments in order to make sense of expressions like the one given above. The same is true for the comparison operators. Perl uses the normal mathematical comparison symbols <, <=, ==, >=, > for numbers and the Fortran-like operators lt, le, eq, ge, gt, lt for strings.

print "le", ("-10" le "+10")? "true" : "false";
print "<=", ("-10" <= "+10")? "true" : "false";

-> le false
   <= true

In the first statement, the "less or equal" comparison is carried out for strings using the le operator. Since the plus sign (43) comes before the minus sign (45) in the ASCII character set, the string "-10" is bigger than the string "+10". Using the numerical <= operator, the strings are implicitly converted to integers and we get the correct result.

Perl became famous for its string processing in general and regular expressions in particular. Strings enclosed in forward slashes are interpreted as regular expressions. Together with the match operator =~, we arrive at a very compact syntax for checking string with respect to regular expressions.

print "x1=y1" =~ /^\w+=\w+$/ ? "true" : "false";
-> true

Here, we test if the string "x1=y1" consists of two alphanumberic words separated by an equality sign (more on regular expressions later). The backslashes are now interpreted in the context of a regular expression. Followed by the letter w they represent the set of alphanumeric characters.

16.2.2. Variables

Perl knows three kinds of variables, each with different scoping rules: global, local, and lexical. Global variables live in the global namespace of the current package (see below; for now we have only used the main package). They are visible to all programs. Local variables are dynamically scoped. They override the value of a global variable for the duration of a block. Finally, lexical variables are lexically scoped, that is, they are only visible within the block they are defined in. Perl's lexical variables correspond to the local variables we are used to from Scheme and the C family.

The default are global variables. Local variables have to be declared with the local keyword, lexical ones with the my keyword. In practice, most programs use almost only lexical variables. For most applications it is safe to assume that variables have to be declared with my. In this chapter, we will restrict ourselves to lexical variables (as we will this, this gets complicated enough in Perl). To ensure that we do not define global variables by accident, we follow the common Perl practice and switch into the strict mode and turn on the warnings at the beginning of the script.

use strict;
use warnings;

As one of Perl's curiosities, variables use the symbols dollar ("$"), at ("@"), and percent ("%") as prefixes depending on the context. Let's start with the simple case of scalar variables which start with a dollar sign.

my $x = "Hello World";
print $x;

-> Hello World

With the conventions we introduced above, the whole program look like this:

use strict;
use warnings;

$\ = "\n";
$, = " ";

my $x = "Hello World";
print $x;

We introduce the new lexical variable with the my keyword and initialize it with the string "Hello World". Alternatively, we could have declared the variable separately and assigned the string afterwards.

my $x;
$x = "Hello World";
print $x;

We can also declare and initialize multiple variables at the same time using my with a list of variables.

my ($x, $y) = ("John", "Joe");
print $x, $y;

-> John Joe

When trying to use a variable outside of its scope, we get a compile error.

{
    my $x = "Hello";
}
print $x;

-> Global symbol "$x" requires explicit package name

The compiler does not tell us that we try to use an undefined variable. Instead the compiler assumes that we want to access a global variable without the package name enforced by the strict mode. Without the strict mode, the print statement would have referred to the global variable $x in the main package. The program would have been compiled without errors, but the output would have been empty, since the global variable $x was not initialized.

Knowing variables, we can now return to the string interpolation. We often would like to insert the value a variable into a string. In Perl (like in UNIX shell languages), this is done automatically when using double quotes to delimit the string.

my $name = "Frank";
print "My name is $name";

-> My name is Frank

This process of substituting variables in strings is called interpolation.

Next to regular expressions, Perl's most prominent feature are the built-in collection types, arrays and hashes. Both names refer to the underlying implementation. Arrays are dynamically resizing vectors containing scalar values. Hashes are maps (implemented as hash tables) with scalar keys and values.

With these collections, the context dependent variable symbols start to get funny. When referring to an array as a whole, a variable starts with an "at" sign, and a hash is indicated by the percent.

my @a = (1, 2, 3);
my %h = ("John", 55, "Joe", 33, "Mary", 110);

print @a;
print %h;

-> 1 2 3
Mary 110 Joe 33 John 55

Both, vectors and hashes are initialized with list literals which are just lists of comma separated values. In a hash context (e.g., assignment to a hash variable), the list is viewed as the alterating list of keys and values. In Perl programs you will often find a special shortcut for lists containing strings. It starts with the keyword qw (quoted words) followed by a list of unquoted string enclosed in some delimiter (often a slash or a parenthesis). The following expressions are all equivalent.

@a = ("John", "Joe", "Mary");
@a = qw/John Joe Mary/;
@a = qw(John Joe Mary);

Using this special list quote you save the double quotes and the commas (some people may not consider this worth a new syntax).

Another more useful syntactical element for lists is the arrow =>. It can be used whereever the comma is used in a list and makes the definition of hashes more readable.

my %h = ("John" => 55, "Joe" => 33, "Mary" => 110);

Besides the pure readability, it also automatically double quotes string so that the expression above can be shortened to the following version.

my %h = (John => 55, Joe => 33, Mary => 110);

You can also confuse the reader (which might be yourself), by using the arrow arbitrarily.

my @a = (1, 2 => 3 => 4, 5);
my %h = ("John", 55 => "Joe", 33 => "Mary" => 110);

The result is always the same.

To construct lists of consecutive values of some enumerated type, Perl also supports a range operator ...

print (0 .. 10);
print ('a' .. 'g');

-> 0 1 2 3 4 5 6 7 8 9 10
a b c d e f g

The ranges include both limits (closed interval).

The funny thing about Perl's type prefixes is that they change depending on the usage of the variable (for the same variable!). Whenever we access an individual element of an array or a hash, the prefix becomes a dollar sign.

print $a[0], $h{"John"}, $h{John};

One could argue that the indexing extracts a scalar, but seriously, there is no convincing argument for this rule. Also note that a hash uses curly braces as the subscript operator. The hash subscript operator also automatically adds double quotes to unquoted string (just like qw and the arrow =>).

Perl's arrays also support negative indexes (counting from the end of the array) and slices.

my @a = (1 .. 5);
print $a[-2];
print @a[1..3];

-> 4
2 3 4

Since a slice is an array, the "at" sign is used as a prefix.

Here are a few more curiosities. What do you think is the value of an array in a scalar context, for example, when assigning it to a to scalar variable?

my @a = ("a", "b", "c");
my $n = @a;
print "n=$n";

-> n=3

Yes, it is the length of the array (the only reasonable scalar value of an array). What about an array literal in a scalar context?

my $x = ("John", "Joe", "Mary");
print "x=$x";
-> x=Mary

It is not interpreted as an array at all, but as an expression (in parentheses) using the comma operator. The comma operator evaluates both sides and returns the value of the right hand side. If applied multiple times, the value of the rightmost expression, here "Mary", is returned.

Since the different types of variables live in different name spaces, you can use the same name for a scalar variable, an array, and a hash.

my $a = "blah";
my @a = ("a", "b", "c");
my %a = (x => 1, y => 2, z => 3);

print '$a:', $a;
print '@a:', @a;
print '$a[1]:', $a[1];
print '%a:', %a;
print '$a{x}:', $a{x};

-> $a: blah
@a: a b c
$a[1]: b
%a: x 1 y 2 z 3
$a{x}: 1

Internally, Perl uses a "multi-cell" approach. Every symbol such as "a" has an associated structure with one cell for scalars, one cell for arrays, one cell for hashes, and so forth. Recall that the Common Lisp implementation uses two cells (one for "ordinary" values, one for functions) whereas Scheme is a one-cell implementation. As we have already seen when comparing Lisp to Scheme, more cells mean that objects can not be handled homogeneously which has to be accomodated by a more complex syntax.

16.2.3. Control Statements

Perl offers the whole set of control statements of procedural languages with a C-like syntax and adds a few less common variations. For example, there is not only the typical if/else statement, but also the negated variant unless/else.

my $x = 55;
if    ($x < 10)  { print "small"; }
elsif ($x < 100) { print "medium"; }
else             { print "big"; }

unless (1 < 2) { print "false"; } else { print "true"; }

The while and for loops work like their C counterparts. You can jump to the next iteration using next (equivalent to C's continue) and leave the loop using last (equivalent to C's break). It is even possible to repeat an iteration using the redo command. If you need some statements to be executed between two iterations, they can be placed in an optional continue block just after the while. Just like unless is a shortcut for if-not (in Perl: if (!...), until is the negated version of while.

You can also iterate through an array using the foreach (you can also use just for) loop. Apart from the different syntax (and less general applicability), it works like Python's for-in loop.

my @a = (1 .. 3);
foreach $i (@a) {
    print "i=$i";
}
foreach (@a) {
    print;
}

-> i=1
i=2
i=3
1
2
3

If no loop variable is specified, the current element is assigned to the special variable $_. Since the print function prints this special variable by default, we arrive at the short form shown in the second loop.

Iterating through the key-value pairs of a hash map is best accomplished with the each function.

my %h = ("John" => 55, "Joe" => 66);
print each %h;
print each %h;
print each %h;
print each %h;

With each call, the each function returns the next key-value pair in the map until there are no entries left. It then returns an empty list. Calling each again, starts the process all over again (Somehow it must store the state of the iterator in the hash map itself). Combined with a while loop, we get all the key-value pairs of a hash.

my %h = ("John" => 55, "Joe" => 66);
while (my ($key, $value) = each %h) {
    print $key, $value;
}

-> Joe 66
John 55

The conditional statements if and unless and the loop statements while and until can also be put behind a statement as a so-called modifier.

print "too big" if ($x > 100);

@a = ("x", "y", "z");
print "elem: $elem" while $elem = shift @a;

16.2.4. References

Here is another quiz: What happens if we put an array into an array? More concrete, what is the length of the following array?

my @a = (1, 2, (3, 4, 5));
my $n = @a;
print "n=$n";

-> n=5

The resulting array contains the numbers one to five! The array (3, 4, 5) is not inserted into the array as a whole, but unpacked and inserted element-wise. Perl's collections contain scalar values only. As another little surprise, trying to store an array in a hash map stores the first element array (not the length and neither the last element).

my %h = ("John", (3, 4, 5));
print $h{"John"};

-> 3

At this point it is time to introduce another scalar type, the reference. A reference is a safe version of a C pointer, and like C's pointers, Perl's references add a lot of power to the language. To obtain a reference to an object, we apply the backslash operator. Going from a reference to the object the reference points to can be accomplished in multiple ways. In the simplest case, we use the reference variable as a variable name, that is, whereever you would normally use the name of the variable (without the prefix symbol), you now use the name of the reference variable (including the dollar sign). Here is a scalar example.

my $x = "Hello";
my $rx = \$x;
print $rx, $$rx
$x = "World";
print $rx, $$rx;

-> SCALAR(0x0173f4) Hello
   SCALAR(0x0173f4) World

And here are some examples using references to arrays and hashes.

my @a = ("John", "Joe", "Mary");
my $ra = \@a;
my $n = @$ra;
print "n=$n ", $$ra[1], $ra->[2];

-> n=3 Joe Mary

For array indexing there is an alternative syntax using an arrow suffix ra-> instead of the dollar prefix $ra as the dereferencing operator.

my %h = ("John" => 55, "Joe" => 66, "Mary" => 77);
my $rh = \%h;
print $$rh{"John"}, $rh->{"John"};

-> 55 55

Reference do not need to point to variables but can also reference values ("anonymous data" in Perl parlance) directly.

my $rx = \"Hello";
print $$rx;
-> Hello

Because of the context sensitivity of the comma operator, it is not possible to create a reference to an anonymous array by just putting a backslash in front of it.

my $ra = \("John", "Joe", "Mary");
print $$ra;

-> Mary

Instead, there is a special syntax for references to anonymous arrays and hashes which makes them almost look like Python lists and maps.

my $ra = ["John", "Joe", "Mary"];
my $rh = { "John" => 55, "Joe" => 55 };
print $ra->[0], print $rh->{"John"};

-> John 55

Now, using references, we can put non-scalar values in collections.

my @a = (1, 2, [3, 4, 5]);
my $n = @a;
print "n=$n ", $a[2]->[1], $a[2][1]

-> n=3 4 4

Note that the dereferencing arrow between array and hash subscripts can be omitted, that is, $a[1]{"a"}[2] is equivalent to $a[1]->{"a"}->[2].

Using the reference variable as a variable name, we always have to first assign the reference to a variable before we can use it. A typical situation is a function returning a reference.

sub names { return ["John", "Joe", "Mary"]; }
my $x = names();
print @$x;

-> John Joe Mary

To avoid the temporary variable, Perl also allows to use an arbitrary block (returning a reference) as a variable name.

sub names { return ["John", "Joe", "Mary"]; }
print @{names()};

Recalling the interpolation property of strings in double quotes, we can now put arbitrary expressions into strings.

print "3 + 4 = ${\(3+4)}";
-> 3 + 4 = 7

Here we compute the result of "3+4", take the reference of it using the backslash operator, and use it as the "variable name" for the interpolated variable. As usual, Perl's syntax needs some getting use to, but it works.

16.2.5. Functions

Let's see how we can organize a Perl program using functions, which are called sub-routines in Perl.

sub times2 {
  my $x = shift(@_);
  return 2 * $x;
}
print times2(20), times2 55;

-> 20 110

Now, this is different from what we've seen in the previous chapters. Perl does not support formal function parameters even though one of its ancestors, GNU's nawk, does. Instead, Perl adopts a shell-like approach and passes the arguments as an array to the function. Inside of the function you can access the argument array with the implicit variable @_. In other words, Perl makes variable argument list, which are the exception in most languages, the rule. Without a formal parameter list, a function definition consists of just the keyword sub, the function name, and the function body which is a block of statements enclosed in curly braces. The first statement introduces a local variable $x and assigns the first element of the argument array to it. Since @_ is the default array, we can just write shift.

sub times2 {
  my $x = shift;
  return 2 * $x;
}

We could also access this element using indexing $_[0], but the shift operation is the more idiomatic way to access function arguments. With two argument, we get two identically looking local variables statements.

sub add {
    my $x = shift;
    my $y = shift;
    return $x + $y;
}

Even shorter, you can assign the argument array to list of local variables representing the arguments. This has the advantage that the arguments array @_ is not changed.

sub add {
    my ($x, $y) = @_;
    return $x + $y;
}

So, in reality the missing formal argument list is always simulated with some idiomatic use of local variables.

How are arguments passed to a function? Since the arguments are passed as an array, the semantics are identical to the construction of arrays. If we pass an array, it will be unpacked and supplied to the function as individual elements in the argument array.

sub showArgs { print "args:", @_; }

showArgs(1, 2, 3);
showArgs(1, ("a", "b", "c"), 2);

-> args: 1 2 3
args: 1 a b c 2

Another surprising property of Perl's argument passing is that scalars are always passed by reference. This means that you can change the value of variables passed to a function.

sub change {
    for my $i (@_) {
	$i *= 2;
    }
}

my @a = (1, 2, 3);
change @a;
print @a;

my $a = 55;
change $a;
print $a;

-> 2 4 6
110

Using references, functions can be passed around as variables and arguments to functions.

my $rs = \&times2;
print $rs, &$rs(5)

-> CODE(0xa01f6c0) 10

Using a function reference we can implement the reduce function.

sub reduce {
    my $f = shift;
    my $initial = shift;
    my @list = @_;
    my $result = $initial;
    foreach $i (@list) {
        $result = &$f($result, $i);
    }
    return $result;
}

sub add { return $_[0] + $_[1]; }

print "add", reduce(\&add, 10, (1, 2, 3));

$mult = sub { $_[0] * $_[1] };
print "mult", reduce($mult, 1, (1 .. 5));

-> add 16
mult 120

The second application of the reduce uses an anonymous function reference for the multiplication. Such a function reference works like a lambda expression. Its syntax is extremely simple: just omit the name of the function (I consider this as one of Perl's highlights).

Using anonymous functions, it is even possible to return functions from functions.

sub adder {
    my $x = shift;
    return sub { my $y = shift; return $x + $y; }
}

$add100 = adder 100;
$add200 = adder 200;
print $add100, &$add100(5);
print $add200, &$add200(5);

-> CODE(0xa0200c4) 105
   CODE(0xa020100) 205

Note that the different calls to adder really create different version of the function which carry the context in which they were created (here the value of the local variable $x which was passed as an argument to adder). In other words, the returned function references are closures.

With this knowledge, defining the compose functional is not a big step anymore.

sub compose {
    my ($f, $g) = @_;
    return sub { &$f(&$g(@_)); };
}

my $h = compose(\&times2, \&add);
print "compose", &$h(3, 4);
print "compose", &{compose(\&times2, \&add)}(3, 4);

-> 14

We pass the two function references as the arguments $f and $g to the compose function. It return a reference to an anonymous function which first applies $g to the arguments of the anonymous function and then $f to the result of $g. To apply the functions, we have to dereference the function references using the plain dollar dereference operator. Similary, when applying the composed function, we have to dereference the anonymous function reference $h returned by the compose function. We could have printed the result directly using a dereference block.

print "compose", &{compose(\&times2, \&add)}(3, 4);