94 Lex: Waving the Magic Wand of Regular Expressions
Lex is a computer program that generates lexical analysers. It is commonly used with the Yacc parser. Originally distributed as proprietary software, some versions of Lex are now open source.
Iwould like to start by saying that the creator is as important as the creation. Lex and Yacc are the creators of a compiler and an interpreter, respectively. To be specific, Lex is used in the first phase of compilation, which is called the lexical analysis phase, as a lexical analyser generator; while Yacc is used in the second, parsing phase as a parser generator. You may feel that since you are not into compilers, this article is of no use for you. Yet, if you use the pattern matching algorithm, this article will help you.
The lexical analysis phase involves the processing of words, the core of which is the identification of patterns in the input. You might have used different pattern matching algorithms. When searching for a single pattern, if we use C for the search, a nested loop containing a conditional statement may be required, though that may not be so simple. Ten patterns may need a minimum of 10 conditional statements if a simple line of code is required for checking a pattern. This is where Lex comes in with the wand of regular expressions.
Lex was developed by Mike Lesk and Eric Schmidt in 1975. According to them, Lex is a program generator designed for lexical processing of character input streams. This is done with the help of regular expressions. Lex identifies the input strings based on the regular expressions specified in the program. Sample programs in this article were run using Flex. Depending on the version used, commands and programs may vary slightly.
Installing Lex
Installation in Linux: Lex is proprietary software, though open source alternatives to it are available. Flex or fast lexical analyser is one such package. To install Flex, type the following command in the terminal: sudo aptget install flex
Installation in Windows: In Windows, installation is done by simply downloading the package and running the program. When you install, make sure that the package is not installed in the folder Program Files.
Regular expressions
Before going into the details of Lex, a quick overview of regular expressions is required. The name itself implies that the regularity of something can be expressed using a regular expression. Let’s suppose we have the following set of sentences:
Sharon is 27 years old.
Stephy is 29 years old.
Sunny is 56 years old.
Shyamala is 46 years old.
We can see that in all the above sentences, the first word begins with 'S' followed by a sequence of characters. The next word is 'is', then comes a two digit number which is followed by 'years old'. This pattern can be represented using a regular expression such as:
S[a-z]+ is [0-9]{2} years old.
Here, S[a-z]+ denotes that one or more occurrences of any character from 'a' to 'z' can follow 'S', and [0-9]{2} represents any two-digit number. Note that blank spaces are used to separate each word in the sentences and are used similarly in the regular expression also.
According to Wikipedia, a regular expression is a sequence of characters that define a search pattern, mainly for