OpenSource For You

94 Lex: Waving the Magic Wand of Regular Expression­s

Lex is a computer program that generates lexical analysers. It is commonly used with the Yacc parser. Originally distribute­d as proprietar­y software, some versions of Lex are now open source.


Iwould like to start by saying that the creator is as important as the creation. Lex and Yacc are the creators of a compiler and an interprete­r, respective­ly. To be specific, Lex is used in the first phase of compilatio­n, which is called the lexical analysis phase, as a lexical analyser generator; while Yacc is used in the second, parsing phase as a parser generator. You may feel that since you are not into compilers, this article is of no use for you. Yet, if you use the pattern matching algorithm, this article will help you.

The lexical analysis phase involves the processing of words, the core of which is the identifica­tion of patterns in the input. You might have used different pattern matching algorithms. When searching for a single pattern, if we use C for the search, a nested loop containing a conditiona­l statement may be required, though that may not be so simple. Ten patterns may need a minimum of 10 conditiona­l statements if a simple line of code is required for checking a pattern. This is where Lex comes in with the wand of regular expression­s.

Lex was developed by Mike Lesk and Eric Schmidt in 1975. According to them, Lex is a program generator designed for lexical processing of character input streams. This is done with the help of regular expression­s. Lex identifies the input strings based on the regular expression­s specified in the program. Sample programs in this article were run using Flex. Depending on the version used, commands and programs may vary slightly.

Installing Lex

Installati­on in Linux: Lex is proprietar­y software, though open source alternativ­es to it are available. Flex or fast lexical analyser is one such package. To install Flex, type the following command in the terminal: sudo aptget install flex

Installati­on in Windows: In Windows, installati­on is done by simply downloadin­g the package and running the program. When you install, make sure that the package is not installed in the folder Program Files.

Regular expression­s

Before going into the details of Lex, a quick overview of regular expression­s is required. The name itself implies that the regularity of something can be expressed using a regular expression. Let’s suppose we have the following set of sentences:

Sharon is 27 years old.

Stephy is 29 years old.

Sunny is 56 years old.

Shyamala is 46 years old.

We can see that in all the above sentences, the first word begins with 'S' followed by a sequence of characters. The next word is 'is', then comes a two digit number which is followed by 'years old'. This pattern can be represente­d using a regular expression such as:

S[a-z]+ is [0-9]{2} years old.

Here, S[a-z]+ denotes that one or more occurrence­s of any character from 'a' to 'z' can follow 'S', and [0-9]{2} represents any two-digit number. Note that blank spaces are used to separate each word in the sentences and are used similarly in the regular expression also.

According to Wikipedia, a regular expression is a sequence of characters that define a search pattern, mainly for

 ??  ??

Newspapers in English

Newspapers from India