The scanner performs lexical analysis of a certain
program (in our case, the Simple program). It reads the source program
as a sequence of characters and recognizes "larger" textual
units called tokens. For example, if the source programs
contains the characters
VAR ics142: INTEGER; // variable declarationthe scanner would produce the tokens
VAR ID(ics142) COLON ID(INTEGER) SEMICOLONto be processed in later phases of the compiler. Note that the scanner discards white space and comments between the tokens, i.e. they are "filtered" and not passed on to later phases. Examples of nontokens are tabs, line feeds, carriage returns, etc.
FLEX (Fast LEXical analyzer generator) is a tool for generating
scanners. In stead of writing a scanner from scratch, you only need to
identify the vocabulary of a certain language (e.g. Simple),
write a specification of patterns using regular expressions (e.g. DIGIT
[0-9]), and FLEX will construct a scanner for you. FLEX is generally
used in the manner depicted here:
First, FLEX reads a specification of a scanner either from an input file *.lex, or from standard input, and it generates as output a C source file lex.yy.c. Then, lex.yy.c is compiled and linked with the "-lfl" library to produce an executable a.out. Finally, a.out analyzes its input stream and transforms it into a sequence of tokens.
*.lex is in the form of pairs of regular expressions and C code. (sample1.lex, sample2.lex)
lex.yy.c defines a routine yylex() that uses the specification to recognize tokens.
a.out is actually the scanner!