Parsing

The parsing is a process of finding a parse tree for a string of tokens. Equivalently, it is a process of determining whether a string of tokens can be generated by a grammar.
The worst-case time pf parsing algorithms are O(nn3) but typical is : O(n) time.

For example. The production rules of grammar G is:

  list    list + digit | list - digit | digit
digit    0 | 1 | . . . | 9

Given token string is 9-5+2.

Parse tree is:

diagram

Each node in the parse tree is labeled by a grammar symbol.
    the interior node corresponds to the left side of the production.
    the children of the interior node corresponds to the right side of production.

The language defined by a grammar is the set of all token strings can be derived from its start symbol.

The language defined by the grammar:

 list     list + digit | list - digit | digit
digit   0 | 1 | 2 | . . . | 9

contains all lists of digits separated by plus and minus signs.

The Epsilon, E, on the right side of the production denotes the empty string.

As we have mentioned above, the parsing is the process of determining if a string of tokens can be generated by a grammar. A parser must be capable of constructing the tree, or else the translation cannot be guaranteed correct. For any language that can be described by CFG, the parsing requires O(n3) time to parse string of n token. However, most programming languages are so simple that a parser requires just O(n) time with a single left-to-right scan over the iput string of n tokens.

There are two types of Parsing

  1. Top-down Parsing (start from start symbol and derive string)
    A Top-down parser builds a parse tree by starting at the root and working down towards the leaves.
  2. Bottom-up Parsing (start from string and reduce to start symbol)
    A bottom-up parser builds a parser tree by starting at the leaves and working up towards the root.