The parsing is a process of finding a parse tree for a string of tokens.
Equivalently, it is a process of determining whether a string of tokens can be
generated by a grammar.
The worst-case time pf parsing algorithms are O(n^{n3}) but typical is :
O(n) time.
For example. The production rules of grammar G is:
list → list + digit | list - digit | digit
digit → 0 | 1 | . . . | 9
Given token string is 9-5+2.
Parse tree is:
diagram
Each node in the parse tree is labeled by a grammar symbol.
the interior node corresponds to the left side of the
production.
the children of the interior node corresponds to the right
side of production.
The language defined by a grammar is the set of all token strings can be derived from its start symbol.
The language defined by the grammar:
list → list + digit | list - digit | digit
digit → 0 | 1 | 2 | . . . | 9
contains all lists of digits separated by plus and minus signs.
The Epsilon, E, on the right side of the production denotes the empty string.
As we have mentioned above, the parsing is the process of determining if a string of tokens can be generated by a grammar. A parser must be capable of constructing the tree, or else the translation cannot be guaranteed correct. For any language that can be described by CFG, the parsing requires O(n3) time to parse string of n token. However, most programming languages are so simple that a parser requires just O(n) time with a single left-to-right scan over the iput string of n tokens.
There are two types of Parsing
- Easy to generate by hand.
- Examples are : Recursive-descent, Predictive.
- Not easy to handle by hands, usually compiler-generating software generate bottom up parser
- But handles larger class of grammar
- Example is LR parser.