Reg­u­lar Ex­pres­sions in Pro­gram­ming Lan­guages: PHP and the Web

This is the fourth ar­ti­cle in the se­ries on reg­u­lar ex­pres­sions. In the past three ar­ti­cles, we have dis­cussed reg­u­lar ex­pres­sion styles in Python, Perl and C++. Now, we will ex­plore reg­u­lar ex­pres­sions in PHP.

OpenSource For You - - Developers - By: Deepu Benson The au­thor is a free soft­ware en­thu­si­ast whose area of in­ter­est is the­o­ret­i­cal com­puter science. He main­tains a tech­ni­cal blog at www.com­put­ing­for­be­gin­ners.blogspot.in and can be reached at deep­umb@hot­mail.com.

Let me first in­tro­duce the PHP en­vi­ron­ment be­fore dis­cussing reg­u­lar ex­pres­sions in it. This ba­sic in­tro­duc­tion of PHP will be suf­fi­cient even for non-prac­ti­tion­ers to try out the reg­u­lar ex­pres­sions dis­cussed here. Even if you’re not in­ter­ested in PHP, the reg­u­lar ex­pres­sions dis­cussed here will def­i­nitely in­ter­est you.

So let us start with the ex­pan­sion of PHP. Ear­lier, the ex­pan­sion of PHP was ‘Per­sonal Home Page’. But now it has been re­placed with the re­cur­sive back­ro­nym ‘PHP: Hyper­text Pre­pro­ces­sor’. PHP was de­vel­oped by Ras­mus Ler­dorf in 1994, and now the PHP de­vel­op­ment team is re­spon­si­ble for pro­duc­ing the PHP ref­er­ence im­ple­men­ta­tion. The stan­dard PHP in­ter­preter is free soft­ware re­leased un­der the PHP Li­cense. PHP can be called a general-pur­pose pro­gram­ming lan­guage but it is mostly used for Web de­vel­op­ment as a server-side script­ing lan­guage. The latest ver­sion is PHP 7.1, which was re­leased in De­cem­ber 2016.

Stand­alone PHP scripts

PHP is mostly used for server-side script­ing. Hence, most of the time you will see PHP scripts em­bed­ded inside HTML. I am sure all of you have heard about HTML (Hyper­text Markup Lan­guage), which is used as a markup lan­guage for cre­at­ing Web pages. Even if you are an ab­so­lute be­gin­ner in HTML, there’s no need to worry. You won’t need any spe­cific HTML skills to un­der­stand the reg­u­lar ex­pres­sions

in this ar­ti­cle. Even though PHP is al­most al­ways paired with HTML, it doesn’t mean that you can’t have stand­alone PHP scripts run­ning off­line in your ma­chine, but it is a bit weird to use PHP to de­velop an ap­pli­ca­tion that only works in the off­line mode. You may find some other pro­gram­ming lan­guages that work bet­ter than PHP for such pur­poses.

The first PHP script we are go­ing to run is a stand­alone PHP script called first.php shown be­low. <?php echo ‘I don\’t de­pend on HTML al­ways’; ?>

Ex­e­cute the com­mand php ­f first.php in a ter­mi­nal to run the script first.php. This and all the other PHP scripts and HTML files dis­cussed in this ar­ti­cle can be down­loaded from open­source­foru.com/ar­ti­cle_­source_­code/ Oc­to­ber17PHP.zip. It is also pos­si­ble to make PHP scripts ex­e­cutable. Con­sider the slightly mod­i­fied PHP script called sec­ond.php shown be­low.

#!/usr/bin/php <?php

echo ‘I don\’t de­pend on HTML al­ways’; ?>

Ex­e­cute the com­mand ./sec­ond.php in a ter­mi­nal to run the script sec­ond.php. But be­fore do­ing this, make sure that you have a PHP ex­e­cutable in your sys­tem. Some­times this ex­e­cutable named ‘php’ may not be present in the di­rec­tory /usr/bin. In that case, find the path to the ‘php’ ex­e­cutable and re­place the line of code #!/usr/bin/php with the line of code #!/YOUR_PATH_TO_PHP/php. Also, make sure that you have the ex­e­cute per­mis­sion for the file sec­ond.php. Fig­ure 1 shows the out­puts of the two PHP scripts first.php and sec­ond.php.

The ‘Hello World’ script in PHP

In each of the ar­ti­cles in this se­ries, I have dis­cussed a dif­fer­ent pro­gram­ming lan­guage but I never had the chance to dis­cuss a ‘Hello World’ pro­gram. So here it is — the ‘Hello World’ script in PHP em­bed­ded inside HTML, called hello.php, is shown be­low:

<html>

<head>

<ti­tle>Hello World Script PHP</ti­tle> </head>

<body>

<?php echo ‘<b> Hello World </b>’; ?> </body> </html>

But to run this PHP script, you need a Web server like Apache. My sys­tem has XAMPP, which is a free and open source Web server so­lu­tion stack that pro­vides Apache

HTTP Server and Mari­aDB, a data­base. XAMPP can also in­ter­pret PHP and Perl scripts on its own. Make sure you have Apache HTTP Server avail­able in your sys­tem by us­ing XAMPP or a sim­i­lar LAMP based Web server so­lu­tion stack. From this point on­wards, I as­sume all of you have XAMPP in your sys­tem. Even if you are us­ing a dif­fer­ent Web server, it will not af­fect the out­put of the PHP scripts in this ar­ti­cle. Just make sure that you know how to run PHP scripts with your Web server.

Now if you have XAMPP, use the com­mand sudo /opt/ lampp/lampp start in a ter­mi­nal to start the XAMPP ser­vice. Of course, you will need root priv­i­leges to do this. After this, open a Web browser and type ‘lo­cal­host’ on the ad­dress bar. If the XAMPP ser­vice is run­ning, you will see the wel­come page of XAMPP. To run the PHP script hello.php, copy it into the di­rec­tory /opt/lampp/ht­docs. All the PHP scripts dis­cussed in this ar­ti­cle, ex­cept first.php and sec­ond.php, should be copied into this di­rec­tory be­cause we need a Web server to process them. But in the case of first.php and sec­ond.php, this is not nec­es­sary be­cause they are stand­alone PHP scripts and can be ex­e­cuted from any­where. Now, on the ad­dress bar of the Web browser, type lo­cal­host/hello.php. You will see the Web browser dis­play­ing the mes­sage ‘Hello World’ in bold. Fig­ure 2 shows the out­put of the PHP script hello.php in the Mozilla Fire­fox Web browser.

Now let us ex­am­ine the script hello.php in de­tail. Most of the HTML tags used in the script, like <html>, <head>, <ti­tle>, <body>, etc, are self-ex­plana­tory; so let us not waste time wor­ry­ing about them. The PHP in­ter­preter parses the PHP part of the script start­ing with the open­ing tag <?php

and end­ing with the clos­ing tag ?> inside which you can have PHP state­ments sep­a­rated by semi-colons. The line of PHP code ‘echo ‘<b> Hello World </b>’;’ passes the out­put ‘<b> Hello World </b>’ to the body of the HTML script. Now, a Web browser will process this fur­ther by in­ter­pret­ing the HTML tag <b> which spec­i­fies bold text. This is why bold text is dis­played on the Web browser as shown in Fig­ure 2.

Reg­u­lar ex­pres­sions in PHP

Now that we know how to set up a server and run PHP scripts, it is time for us to dis­cuss reg­u­lar ex­pres­sions in PHP. There are three sets of reg­u­lar ex­pres­sion func­tions in PHP to choose from. These are the preg func­tions, mb_ereg func­tions and ereg func­tions. Out of these three, we will be dis­cussing just one set of func­tions used for reg­u­lar ex­pres­sion pro­cess­ing, the preg func­tions.

There are some good rea­sons to choose preg func­tions over the other two. First of all, preg is PCRE based. We have al­ready dis­cussed PCRE (Perl Com­pat­i­ble Reg­u­lar Ex­pres­sions) reg­u­lar ex­pres­sion style in de­tail in the first two ar­ti­cles in this se­ries. Those ar­ti­cles cov­ered Python and Perl, both of which use PCRE style reg­u­lar ex­pres­sions. So, it is wise to use this style be­cause then it is not nec­es­sary to dis­cuss the syn­tax of the reg­u­lar ex­pres­sions used in PHP. All you have to do is just re­fresh the syn­tax you have learned while learn­ing reg­u­lar ex­pres­sions in Python and Perl. This is one point in favour of preg func­tions, while there are some faults with the other two sets of reg­u­lar ex­pres­sions.

The mb_ereg func­tions are more com­pli­cated and are use­ful only if we are pro­cess­ing multi-byte char­ac­ter sets. We will come across multi-byte char­ac­ter sets only when pro­cess­ing lan­guages like Korean, Ja­panese or Chi­nese that have a huge num­ber of char­ac­ters. As an aside, let me add, un­like most other lan­guages which use an al­pha­bet with a fixed num­ber of char­ac­ters, these lan­guages have tens of thou­sands of lo­gograms to rep­re­sent dif­fer­ent words.

Now, let us come back to our busi­ness; it would be un­nec­es­sary to bur­den learn­ers by dis­cussing the mb_ereg set of func­tions with no real ben­e­fit in sight. And what dis­qual­i­fies the ereg set of func­tions? Well, they are the old­est set of func­tions but they were of­fi­cially dep­re­cated from PHP 5.3.0 on­wards. Since we have de­cided to stick with the preg set of func­tions in PHP to han­dle reg­u­lar ex­pres­sions, we don’t need any fur­ther dis­cus­sion re­gard­ing the syn­tax, be­cause we are al­ready fa­mil­iar with the PCRE syn­tax.

The main func­tions of­fered by the preg reg­u­lar ex­pres­sion engine in­clude preg_­match( ), preg_­match_all

(), preg_re­place( ), preg_re­place_all( ), preg_s­plit( ), and preg_quote( ). The func­tion preg_­match( ) can give dif­fer­ent results based on the num­ber of pa­ram­e­ters used in it. In its sim­plest form, the func­tion can be used with just two pa­ram­e­ters as preg_­match($pat, $str). Here, the reg­u­lar ex­pres­sion pat­tern is stored in the vari­able $pat and the string to be searched is stored in the vari­able $str. This func­tion re­turns true if the given pat­tern is present in the string and re­turns false if no match is found.

A sim­ple PHP script us­ing reg­u­lar ex­pres­sions

Now that we have some idea about the reg­u­lar ex­pres­sion syn­tax and the work­ing of one func­tion in the preg set of func­tions, let us con­sider the sim­ple PHP script called regex1.php shown be­low:

<html>

<body>

<?php

$pat = ‘/You/’;

$str = ‘Open Source For You’; if(preg_­match($pat,$str))

{ echo ‘<b> Match Found </b>’; } else { echo ‘No Match Found’;

} ?> </body> </html>

To view the out­put of this script, open a Web browser and type lo­cal­host/regex1.php on the ad­dress bar. The mes­sage ‘Match Found’ will be dis­played on the Web browser in bold text. This script also tells us how the func­tion preg_­match( ) searches for a match. The func­tion searches the en­tire string to find a match. Let us an­a­lyse the script regex1.php line by line. The HTML part of the code is straight­for­ward and doesn’t need any ex­pla­na­tion. In the PHP part of the script, we have used two vari­ables $pat and $str. The pat­tern to be matched is stored in the vari­able $pat by the line of code ‘$pat = ‘/You/’;’. Here we are go­ing for a di­rect match for the word ‘You’. As you might have ob­served, the de­lim­iters of the reg­u­lar ex­pres­sion are a pair of for­ward slashes (/). The vari­able $str con­tains the string which is searched for a pos­si­ble match, and this is done by the line of code ‘$str = ‘Open Source For You’;’. The next few lines of code have an if­else block to print some mes­sages de­pend­ing on the con­di­tion of the if state­ment.

In the line of code ‘if(preg_­match($pat,$str))’ the func­tion preg_­match( ) re­turns true if there is a match and re­turns false if there is no match. In case of a match, the line of code ‘echo ‘<b> Match Found </b>’;’ inside the if block will print the mes­sage ‘Match Found’ in bold text. In case there is no match, the line of code ‘echo ‘No Match Found’;’ in the else block will print the mes­sage ‘No Match Found’.

It is also pos­si­ble to call the func­tion preg_­match() with three pa­ram­e­ters as preg_­match($pat, $str, $val) where the ar­ray vari­able $val con­tains the matched string. Con­sider the PHP script regex2.php shown be­low: <?php

$pat = ‘/b+/’;

$str = ‘aaaabbb­baaaa’; if(preg_­match($pat,$str,$val)) { $temp = $val[0]; echo “<b> Matched string is $temp </b>”; } else { echo ‘No Match Found’;

} ?>

To view the out­put of this script, open a Web browser and type ‘lo­cal­host/regex2.php’ on the ad­dress bar. The mes­sage ‘Matched string is bbbb’ will be dis­played on the Web browser in bold text. This also tells us that the func­tion preg_­match( ) goes for a greedy match, which results in the long­est pos­si­ble match. Thus, the func­tion does not match strings b, bb, or bbb; in­stead bbbb is the matched string.

The vari­able $val[0] con­tains the en­tire text matched by the reg­u­lar ex­pres­sion pat­tern. At this point, I should also men­tion the dif­fer­ence be­tween strings inside sin­gle quotes and dou­ble quotes in PHP. The former are treated lit­er­ally, whereas for the strings inside dou­ble quotes, the con­tent of the vari­able is printed in­stead of just print­ing their names.

Other func­tions in preg

There are many other use­ful func­tions of­fered by the preg class of func­tions in PHP for reg­u­lar ex­pres­sion pro­cess­ing other than the func­tion preg_­match(). But we will only dis­cuss a very use­ful func­tion called preg_re­place() which re­places the matched string with an­other string. The func­tion can be used with three pa­ram­e­ters as fol­lows: preg_re­place($pat, $rep, $str) where $pat con­tains the reg­u­lar ex­pres­sion pat­tern, $rep con­tains the re­place­ment string, and $str con­tains the string to be searched for a pat­tern. Con­sider the PHP script regex3.php shown be­low:

<?php

$pat = ‘/World/’;

$rep = ‘Friends’;

$str = ‘Hello World’; if(preg_­match($pat,$str)) { $str = preg_re­place($pat,$rep,$str); echo “<b> The mod­i­fied string: $str </b>”; } else { echo ‘No Match Found’;

} ?>

The func­tion preg_re­place() will not mod­ify the con­tents of the vari­able $str as such. In­stead the func­tion will only re­turn the mod­i­fied string. In this ex­am­ple, the line of code ‘$str = preg_re­place($pat,$rep,$str);’ re­places the word ‘World’ with the word ‘Friends’, and this mod­i­fied string is ex­plic­itly stored in the vari­able $str. To view the out­put of this script, open a Web browser and type lo­cal­host/regex3. php on the ad­dress bar. The mes­sage ‘The mod­i­fied string: Hello Friends’ will be dis­played on the Web browser in bold text. In case of both regex2.php and regex3.php, I have only shown the PHP por­tion of the scripts for want of space, but the com­plete scripts are avail­able for down­load.

A reg­u­lar ex­pres­sion for val­i­dat­ing num­bers

Now we are go­ing to look at how our knowl­edge of reg­u­lar ex­pres­sions will help us val­i­date num­bers us­ing PHP.

The aim is to check whether the given num­ber en­tered through a text box in an HTML page is an in­te­ger or a real num­ber, and print the same on the Web page in bold text. If the in­put text is nei­ther an in­te­ger nor a real num­ber, then the mes­sage ‘Not a num­ber’ is dis­played on the Web page in bold text. But re­mem­ber, this state­ment is fac­tu­ally in­cor­rect as math­e­ma­ti­cians will be ea­ger to point out that the in­put text could still be a num­ber by be­ing an ir­ra­tional num­ber like Π (Pi) or a com­plex num­ber like 5 + 10i. It could even be a quater­nion or an oc­to­nion, even more bizarre num­ber sys­tems. But I think as far as prac­tis­ing com­puter science peo­ple are con­cerned, in­te­gers and real num­bers are suf­fi­cient most of the times. To achieve this, we have two scripts called num­ber.html and num­ber.php. The script num­ber.html is shown be­low:

<html>

<body>

<form ac­tion=”num­ber.php” method=”post”> En­ter a Num­ber:

<in­put type=”text” name=”num­ber”> <in­put type=”submit” value=”CLICK”> </form>

</body>

<html>

The script num­ber.html reads the num­ber in a text field, and when the Submit but­ton is pressed the script num­ber.php is in­voked. The in­put data is then passed to the script num­ber.php by us­ing the POST method for fur­ther pro­cess­ing. The script

num­ber.php is shown be­low. At this point, also re­mem­ber the nam­ing con­ven­tion of HTML files. If the HTML file con­tains em­bed­ded PHP script, then the ex­ten­sion of the HTML file is .php, and if there is no em­bed­ded PHP script inside an HTML script, then the ex­ten­sion of the file is .html.

<html>

<body>

<?php

$pat1 = ‘/(^[+-]?\d+$)/’; $pat2 = ‘/(^[+-]?\d*\.\d+$)/’; $str = $_POST[“num­ber”]; if(preg_­match($pat1,$str))

{ echo ‘<b> In­te­ger </b>’; } el­seif(preg_­match($pat2,$str)) { echo ‘<b> Real Num­ber </b>’; } else { echo ‘<b> Not a num­ber </b>’;

} ?> </body> </html>

The HTML sec­tion of the file only con­tains the tags <html> and <body> and their mean­ing is ob­vi­ous. But the PHP script in the file re­quires some ex­plain­ing. There are two reg­u­lar ex­pres­sion pat­terns de­fined by the PHP script stored in the vari­ables $pat1 and $pat2. If you ex­am­ine the two reg­u­lar ex­pres­sion pat­terns care­fully, you will un­der­stand the ben­e­fits of us­ing preg which is based on PCRE. I have reused the same reg­u­lar ex­pres­sion pat­terns we have dis­cussed in the ear­lier ar­ti­cle deal­ing with Perl. The line of code ‘$pat1 = ‘/(^[+-]?\d+$)/’;’ de­fines a reg­u­lar ex­pres­sion pat­tern that matches any in­te­ger. Even in­te­gers like +111, -222, etc, will be matched by this reg­u­lar ex­pres­sion.

The next line of code ‘$pat2 = ‘/(^[+-]?\d*\.\d+$)/’;’ de­fines a reg­u­lar ex­pres­sion pat­tern that matches real num­bers. Here again, we are only iden­ti­fy­ing a sub­set of real num­bers called ra­tio­nal num­bers. But then again, let us not be too math­e­mat­i­cal. For a de­tailed dis­cus­sion of these reg­u­lar ex­pres­sions, re­fer to the ear­lier ar­ti­cle on Perl, in this se­ries. The best part is that any reg­u­lar ex­pres­sion that we have de­vel­oped there can be used in PHP with­out mak­ing changes. I have made a slight change in the sec­ond reg­u­lar ex­pres­sion pat­tern /(^[+-]?\d*\.\d+$)/ to ac­com­mo­date real num­bers of the form .333 also. The orig­i­nal Perl reg­u­lar ex­pres­sion was /(^[+-]?\d+\.\d+$)/ which will only val­i­date real num­bers like 0.333 and not .333.

The next line of code ‘$str = $_POST[“num­ber”];’ reads the in­put data from the HTML file num­ber.html and stores it in the vari­able $str. The next few lines of code con­tain an if­else block which matches the in­put text with the two reg­u­lar ex­pres­sion pat­terns. The func­tion preg_­match( ) is used in the if state­ment and the el­seif state­ment to search for a match. De­pend­ing on the results of these matches, the PHP script prints the suit­able mes­sage in bold text in the Web browser. To view the out­put of the HTML script, open a Web browser and on the ad­dress bar, type lo­cal­host/num­ber.html. The re­sult­ing HTML page is shown in Fig­ure 3. En­ter a num­ber in the text field and press the Submit but­ton. You will see one of the three pos­si­ble out­put mes­sages on the Web page — ‘In­te­ger’, ‘Real Num­ber’, or ‘Not a num­ber’. Fig­ure 4 shows the out­put ob­tained when the num­ber -222.333 is given as in­put.

Now that we have dis­cussed a use­ful reg­u­lar ex­pres­sion, it is time to wind up the ar­ti­cle. Here, I have dis­cussed the pro­gram­ming lan­guage PHP al­most as much as the reg­u­lar ex­pres­sions in it. I be­lieve the whole point of this se­ries is to ex­plore how reg­u­lar ex­pres­sions work in dif­fer­ent pro­gram­ming lan­guages by analysing the fea­tures of those pro­gram­ming lan­guages rather than dis­cussing reg­u­lar ex­pres­sions in a lan­guage-ag­nos­tic way. And now that we have cov­ered PHP reg­u­lar ex­pres­sions, I am sure you will have some idea about us­ing reg­u­lar ex­pres­sions on the server side. But what about reg­u­lar ex­pres­sions on the client side? In the last ex­am­ple, the val­i­da­tion could have been done on the client side it­self rather than send­ing the data all the way to the server. So, in the next ar­ti­cle in this se­ries, we will dis­cuss the use of reg­u­lar ex­pres­sions in JavaScript – a client-side script­ing lan­guage.

Fig­ure 2: ‘Hello World’ in PHP

Fig­ure 1: Out­put of stand­alone PHP scripts

Fig­ure 3: HTML page from num­ber.html

Fig­ure 4: Out­put of num­ber.php

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.