Perl: De­ci­pher­ing old (manu)scripts

Perl was meant to be prac­ti­cal, not beau­ti­ful. And it is – just don’t run one-lin­ers you don’t un­der­stand (yet).

Linux Format - - TUTORIAL -

We know what you are think­ing now: “C’mon, who needs Perl at the end of 2018?” It has a track record of be­ing barely read­able (if not write-only), clumsy and has been largely su­per­seded by Python and friends. While it’s prob­a­bly true that you don’t want to start a new project in Perl un­less you have some spe­cific re­quire­ments, it’s still pos­si­ble to come across it in the real world. Deb­helper is mostly Perl, and some­times you have to read dh_­some­thing to learn why it works the way it works. Spa­mas­sasin ( https://spamas­sas­sin.apache.org) and Shore­wall ( http://shore­wall.org) are Perl as well. Last but not least, you can run a Perl script to gen­er­ate beau­ti­ful flame graphs ( https://github.com/bren­dan­gregg/ FlameGraph), al­though we’ve never had to fig­ure out how it is done. That’s not to men­tion Perl shines in one­time text pro­cess­ing tasks, thanks to its clear seam­lessly-in­te­grated reg­u­lar ex­pres­sions.

Prop­erly writ­ten Perl code isn’t hard to read and rea­son about. Perl might re­ally favour cryptic code, but it doesn’t mean you should. Most Perl snip­pets I come across these days should be clear for ev­ery­one who has spared ten min­utes or so learn­ing some ba­sic con­cepts. That’s ex­actly what we are go­ing to do to­day.

Volatile vari­ables

Perl pro­grams are no dif­fer­ent from any other pro­grams you’ve seen. They also con­tain vari­ables and ex­pres­sions, func­tions (or sub­rou­tines – yup, Perl re­ally is an old thing), if ex­pres­sions, for loops and so on.

When it comes to vari­ables, Perl dis­tin­guishes be­tween scalars (sin­gle val­ues) or lists. You can eas­ily say some­thing is a vari­able as it starts with a dol­lar sign. The same goes for ar­ray items, whether or­di­nary or as­so­cia­tive (aka hash ta­bles or sim­ply hashes):

my $in­dex = 1;

print $ar­ray[$in­dex];

print $hash{“key_$in­dex”}

You use square brack­ets to in­dex ar­rays and curly brack­ets for hashes. Ar­ray in­dexes are in­te­gers, hash keys are any­thing con­ceiv­able. Also note that Perl in­ter­po­lates vari­ables in­side a string: that’s some­thing we had to wait for un­til 3.6 in Python (see PEP-498).

In­dex­ing an el­e­ment which doesn’t ex­ist in an ar­ray or hash is le­gal. There is no ex­cep­tion thrown (well, Perl has none at all), but you just get an empty value (un­de­fined). If you need to be sure the key is present, you use the ex­ist func­tion to check.

Now, con­sider the fol­low­ing: $hash{“key_$in­dex”}{‘foo’} = ‘bar’.

You may ex­pect this to fail since it de-ref­er­ences un­de­fined, but it works. Thanks to a fea­ture called au­to­viv­i­fi­ca­tion, a new hash (or ar­ray) is cre­ated as needed. And if you won­der about sin­gle quotes, that’s how you tell Perl you want no string in­ter­po­la­tion. qq(key_$in­dex) or qs#foo# are equiv­a­lent to “key_$in­dex” and ‘foo’, re­spec­tively. Us­ing these im­prop­erly is a part of what makes Perl pro­grams so cryptic.

So far, so good? $var is a jar­gon way to say ‘a vari­able’ in a tech dis­cus­sion, so it hardly con­fuses any­one. How­ever, Perl goes a bit fur­ther:

my @ar­ray = (‘a’, ‘b’, ‘c’);

my %hash = (‘a’ => 1, ‘b’ => 2, ‘c’ => 3);

Un­like PHP, ar­rays and hashes in Perl have dis­tinct pre­fixes. In the ex­am­ple above, they are what re­ally makes a dif­fer­ence, as => and com­mas are syn­onyms. So what do you mean by (1, 2): is it a two-el­e­ment ar­ray

or one el­e­ment hash? Well-writ­ten Perl pro­grams al­ways use => for hashes to make dec­la­ra­tions vis­ually dif­fer­ent.

As a fi­nal re­mark, note the my key­word which be­gins all the above dec­la­ra­tions. It makes the dec­la­ra­tion lo­cal to a lex­i­cal scope such as block { }. Its cousin, ours, de­clares pack­age-level vari­ables. Both are op­tional un­less a pro­gram starts with use strict;. All proper Perl pro­grams carry this pragma along with use warn­ings;, since it helps to avoid com­mon er­rors such as ty­pos.

Some magic bits

Not all vari­ables are cre­ated equal. There are ones you de­fine as your pro­gram’s logic dic­tates. And there are oth­ers which come built-in, akin to $? in Bash. These pre­de­fined vari­ables make scripts shorter, but also less read­able un­less you are aware of them. Con­sider this:

while (<>) {

chomp;

next un­less $_;

# Some other code

}

Surely you’ve iden­ti­fied the while loop. <> is how you read from a file in Perl. Typ­i­cally, you call open() to ob­tain a so-called file han­dle, then do <F> to read a line from it. In this case, the han­dle is omit­ted. So Perl de­faults to stdin – it’s just an­other im­plicit thing that’s worth be­ing aware of.

So, <> reads from stdin, but where does it stores the re­sult? The an­swer is $_, or de­fault in­put. <> uses it un­less you say oth­er­wise my $line = <F> and many func­tions, such as chomp(), use this vari­able as the in­put ar­gu­ment if none are supplied. The chomp() func­tions re­moves a trail­ing white­space or, in fact, $/ value. The lat­ter con­tains an in­put record separa­tor (as in awk) and de­faults to a new line. So, chomp() sim­ply strips a new line.

The next ex­pres­sion is dif­fer­ent in that it shows $_ ex­plic­itly. The next com­mand is what the con­tinue op­er­a­tor is in C-like lan­guages: it moves a loop to the next it­er­a­tion (and by the way, break is called last). The con­di­tion is more in­ter­est­ing. While all Tur­ing-com­plete lan­guages have a form of if, Perl also sports un­less which stands for – you guessed it – if not. Sec­ondly, it comes as a suf­fix: com­pare this to if (!$_) { next };. Post­fix con­di­tion­als are how you make Perl read as if it were plain English. By the way, this also holds true for vari­able names. A pragma, use English, trans­lates `$_` to $ARG, `$/` to $IN­PUT_RECORD_SEP­A­RA­TOR, and so on. It’s rare that a Perl script will make use of this fea­ture (in our ex­pe­ri­ence, your mileage may vary), which it’s a pity since it makes code longer, but far less cryptic.

Now you can eas­ily see the snip­pet above is just a com­mon wrap­per to it­er­ate over lines in a file while skip­ping empty ones. An­other way to achieve a sim­i­lar re­sult would be to run the script with perl -p, which im­plic­itly wraps the code with while (<>) {}, yet doesn’t chomp for you. This reaf­firms the well-known Perl motto: “There is more than one way to do it”.

Tiny sub­rou­tines

Back to $_. Re­call that dol­lar sign means a scalar. What if you make the very same _ vari­able an ar­ray? This brings us to Perl’s sub­rou­tines:

sub add($$) {

my ($op1, $op2) = @_;

re­turn $op1 + $op2;

}

print add(2, 2) # yup, 4

As you might have guessed by now, @_ holds a sub­rou­tine’s ar­gu­ments. You may also no­tice Perl sup­ported de­struc­tive as­sign­ment well be­fore it be­came main­stream. An­other id­iomatic way to give @_ items a mean­ing­ful name is the shift func­tion:

sub add($$) {

my $op1 = shift;

}

It works the same way as in Bash by re­mov­ing the first el­e­ment in the ar­ray. And – you guessed it again – it uses @_ if the ar­ray is not spec­i­fied.

Now, you may be won­der­ing why not to spec­ify func­tion ar­gu­ments in the pro­to­type, like many other lan­guages do. We don’t know the an­swer. Do note, how­ever, that Perl has (some­what rudi­men­tary and op­tional) func­tion pro­to­typ­ing sup­port as well. Two dol­lar signs tell the com­piler you ex­pect a caller to sup­ply add() sub­rou­tine two scalar ar­gu­ments. So, if you call it as this: add @ar­ray;, Perl would com­plain: Not enough ar­gu­ments for main::add.

There are quite a few other mag­i­cal vari­ables in Perl. For in­stance, $! ($ ER­RNO in English) holds the last er­ror code; C calls this er­rno. It’s of­ten en­coun­tered in ex­pres­sions like this: ```

open(F, “<my­file.txt”) || die “Can’t open my­file.txt: $!”

```

This would ter­mi­nate the pro­gram with an ap­pro­pri­ate er­ror mes­sage if my­file.txt is not found or oth­er­wise un­read­able.

First- class pat­tern match­ing

An­other area where Perl works well is in reg­u­lar ex­pres­sion sup­port. Most lan­guages have it to­day – ei­ther na­tively or via stan­dard li­braries – but Perl is one of the few which has reg­u­lar ex­pres­sions fused into the syn­tax. This is how you do a match:

my $tar­get = “Hello, world!”;

$tar­get =~ /world/;

The word be­tween slashes is a reg­u­lar ex­pres­sion, as in JavaScript. But un­like JavaScript (and sim­i­lar to sed), any char­ac­ter pair could act as a de­lim­iter, as we’ll see in just a mo­ment.

You may now think that Perl uses =~ as a pat­tern match­ing op­er­a­tor. Not quite. In fact, it’s m (yes, a sin­gle let­ter): m/world/. How­ever, pat­tern match­ing is so com­mon in Perl that you can omit this op­er­a­tor to save some typ­ing. One ex­cep­tion is when you want to use a non-stan­dard de­lim­iter, say !world! or (world), as in qq/ qs/q- what­ever. In this case, m would be manda­tory.

=~ (be sure not to put space in be­tween) is a bi­nary bind­ing op­er­a­tor. Ba­si­cally, it just says to ap­ply the op­er­a­tion on the right to a scalar at the left. Now, what do you think the fol­low­ing con­struct would do? /world/

While this looks like a mere dec­la­ra­tion, it’s re­ally a Boolean ex­pres­sion. m op­er­a­tor is im­plied, and with­out the bi­nary bind­ing, $_ serves as an in­put. So, the above wrap­per to skip empty lines in a file can be also be rewrit­ten as:

while (<>) {

chomp; next if /^$/;

}

The reg­u­lar ex­pres­sion matches an empty line. !~ negates the re­sult of the match, so if /^$/ is the same as un­less $_ !~ /^$/, yet the lat­ter is re­ally mind-bend­ing.

Mak­ing sub­sti­tu­tions is just as sim­ple: you use s op­er­a­tor (ex­plicit this time) and bind to a vari­able you want to mod­ify:

$text =~ s/foo/bar/g

Let­ters at the end are mod­i­fier flags. This is how you tell Perl you want the match to be case-in­sen­si­tive ( i) or to re­place all oc­cur­rences ( g), as we do here, the m op­er­a­tor sup­ports these as well.

Most reg­u­lar ex­pres­sions di­alects sup­port so-called ‘cap­tur­ing paren­the­sis’ to store parts of the match. Re­triev­ing these cap­ture groups in the code could be cum­ber­some, yet Perl makes it rather straight­for­ward via magic vari­ables:

“To­tal: 10 GBP” =~ /To­tal: (\d+) ([A-Z]+)/; print “Your to­tal was $1 in $2”;

Perl reg­u­lar ex­pres­sions are pow­er­ful enough to be a thing of their own. Many lan­guages and tools sport Perl­com­pat­i­ble Reg­u­lar Ex­pres­sions (PCRE), and we tend to switch them on where avail­able.

We hope this short in­tro­duc­tion to Perl was en­joy­able, yet it barely scratches the sur­face. Per­haps Perl is not that cryptic, but it’s still a so­phis­ti­cated lan­guage that takes time to mas­ter. There are nu­mer­ous re­sources to as­sist you in this process; please see the box­out on this page for starters.

Shore­wall makes writ­ing com­plex net­work ac­cess poli­cies a sim­ple task. Be­hind the cur­tains, it uses Perl to com­pile your rules.

Perl is a Spar­tan lan­guage, and so should your REPL be. For­get in­ter­ac­tive shells, and go straight to the con­sole.

Flame graphs are the new trend in per­for­mance pro­fil­ing. Iron­i­cally, they rely on good old Perl to vi­su­alise raw stack traces.

Perl­doc is avail­able on­line, but you can also down­load the whole thing and browse it where the web is un­avail­able.

Newspapers in English

Newspapers from Australia

© PressReader. All rights reserved.