- Syntax
- Semantics
- Implementing parsers from parsing expression grammars
- Lexical Analysis
- Left recursion
- Referencias y Documentación

In computer science, a parsing expression grammar, or PEG, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language.

The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s.

Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation:

- the choice operator selects the first match in PEG, while it is ambiguous in CFG.
- This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

Unlike CFGs, PEGs cannot be ambiguous;
*if a string parses, it has exactly
one valid parse tree*.

It is conjectured that there exist context-free languages that cannot be parsed by a PEG, but this is not yet proven.

- A finite set of nonterminal symbols.
- A finite set of terminal symbols that is disjoint from .
- A finite set of parsing rules.
- An expression termed the starting expression.

Each parsing rule in has the form , where is a nonterminal symbol and is a parsing expression.

A parsing expression is a hierarchical expression similar to a regular expression, which is constructed in the following fashion:

- An atomic parsing expression consists of:
- any terminal symbol,
- any nonterminal symbol, or
- the empty string .

- Given any existing parsing expressions , , and ,
a new parsing
expression can be constructed using the following operators:
- Sequence:
- Ordered choice:
- Zero-or-more:
- One-or-more:
- Optional:
- And-predicate:
- Not-predicate:

The fundamental difference between context-free grammars and parsing expression grammars is that the PEG's choice operator is ordered:

- If the first alternative succeeds, the second alternative is ignored.
- Thus ordered choice is not commutative, unlike unordered choice as in context-free grammars.
- The consequence is that if a CFG is transliterated directly to a PEG, any ambiguity in the former is resolved by deterministically picking one parse tree from the possible parses.
- By carefully choosing the order in which the grammar alternatives are specified, a programmer has a great deal of control over which parse tree is selected.
- PEGs can look ahead into the input string without actually consuming it
- The and-predicate expression invokes the sub-expression ,
and then succeeds if succeeds and fails if fails,
*but in either case never consumes any input*. - The not-predicate expression succeeds if
fails and fails if succeeds,
*again consuming no input in either case*.

Any parsing expression grammar can be converted directly into a recursive descent parser.

Due to the unlimited lookahead capability that the grammar formalism provides, however, the resulting parser could exhibit exponential time performance in the worst case.

It is possible to obtain better performance for any parsing expression grammar by converting its recursive descent parser into a packrat parser, which always runs in linear time, at the cost of substantially greater storage space requirements.

*A packrat parser is a form of parser
similar to a recursive descent parser in construction, except that
during the parsing process
it memoizes
the intermediate results of all
invocations of the mutually recursive parsing functions*, ensuring that
each parsing function is only invoked at most once at a given input
position.

Because of this memoization, a packrat parser has the ability to parse many context-free grammars and any parsing expression grammar (including some that do not represent context-free languages) in linear time.

Examples of memoized recursive descent parsers are known from at least as early as 1993.

Note that this analysis of the performance
of a packrat parser **assumes that enough memory is available to hold all
of the memoized results**; in practice, if there were not enough memory,
some parsing functions might have to be invoked more than once at the
same input position, and consequently the parser could take more than
linear time.

It is also possible to build LL parsers and LR parsers from parsing expression grammars, with better worst-case performance than a recursive descent parser, but the unlimited lookahead capability of the grammar formalism is then lost. Therefore, not all languages that can be expressed using parsing expression grammars can be parsed by LL or LR parsers.

Parsers for languages expressed as a CFG, such as LR parsers, require a separate tokenization step to be done first, which breaks up the input based on the location of spaces, punctuation, etc.

The tokenization is necessary because of the way these parsers use lookahead to parse CFGs that meet certain requirements in linear time.

PEGs do not require tokenization to be a separate step, and tokenization rules can be written in the same way as any other grammar rule.

PEGs cannot express left-recursive rules where a rule refers to itself without moving forward in the string. For example, the following left-recursive CFG rule:

string-of-a -> string-of-a 'a' | 'a'can be rewritten in a PEG using the plus operator:

string-of-a <- 'a'+The process of rewriting indirectly left-recursive rules is complex in some packrat parsers, especially when semantic actions are involved.

- Véase Parsing Expression Grammar
- PEG.js documentation
- Testing PEG.js Online
- Michael's Blog: JavaScript Parser Generators. The PEG.js Tutorial
- The Packrat Parsing and Parsing Expression Grammars Page
- PL101: Create Your Own Programming Language. Véanse [3] y [4]
- PL101: Create Your Own Programming Language: Parsing

2016-03-27