Práctica: Ambiguedad en C++

This lab illustrates a problem that arises in C++. The C++ syntax does not disambiguate between expression statements (stmt) and declaration statements (decl). The ambiguity arises when an expression statement has a function-style cast as its left-most subexpression. Since C does not support function-style casts, this ambiguity does not occur in C programs. For example, the phrase

int (x) = y+z;

parses as either a decl or a stmt.

The disambiguation rule used in C++ is that if the statement can be interpreted both as a declaration and as an expression, the statement is interpreted as a declaration statement.

The following examples disambiguate into expression statements when the potential declarator is followed by an operator different from equal or semicolon (type_spec stands for a type specifier):

expr dec
type_spec(i)++;      
type_spec(i,3)<<d;  
type_spec(i)->l=24;
type_spec(*i)(int); 
type_spec(j)[5];   
type_spec(m) = { 1, 2 }; 
type_spec(a);              
type_spec(*b)();          
type_spec(c)=23;         
type_spec(d),e,f,g=0;   
type_spec(h)(e,3);

Regarding to this problem, Bjarne Stroustrup remarks:

Consider analyzing a statement consisting of a sequence of tokens as follows:
              type_spec (dec_or_exp) tail
Here dec_or_exp must be a declarator, an expression, or both for the statement to be legal. This implies that tail must be a semicolon, something that can follow a parenthesized declarator or something that can follow a parenthesized expression, that is, an initializer, const, volatile, (, [, or a postfix or infix operator. The general cases cannot be resolved without backtracking, nested grammars or similar advanced parsing strategies. In particular, the lookahead needed to disambiguate this case is not limited.

The following grammar depicts an oversimplified version of the C++ ambiguity:

$ cat CplusplusNested.y 
%token ID INT NUM

%right '='
%left '+'

%%
prog:
    /* empty */
  | prog stmt
;

stmt: 
    expr ';' 
  | decl    
;

expr:
    ID 
  | NUM
  | INT '(' expr ')' /* typecast */ 
  | expr '+' expr
  | expr '=' expr
;

decl:
    INT declarator ';'
  | INT declarator '=' expr ';'
;

declarator:
    ID 
  | '(' declarator ')'
;

%%

Escriba un programa PegJS en CoffeeScript que distinga correctamente entre declaraciones y sentencias. Este es un ejemplo de un programa que usa una solución al problema:

[~/Dropbox/src/javascript/PLgrado/pegjs-coffee-plugin/examples(master)]$ cat use_cplusplus.coffee 
PEG = require("./cplusplus.js")
input = "int (a); int c = int (b);"

r = PEG.parse(input)
console.log("input = '#{input}'\noutput="+JSON.stringify r)

input = "int b = 4+2  ;  "
r = PEG.parse(input)
console.log("input = '#{input}'\noutput="+JSON.stringify r)

input = "bum = caf = 4-1;\n"
r = PEG.parse(input)
console.log("input = '#{input}'\noutput="+JSON.stringify r)

input = "b2 = int(4);"
r = PEG.parse(input)
console.log("input = '#{input}'\noutput="+JSON.stringify r)

input = "int(4);"
r = PEG.parse(input)
console.log("input = '#{input}'\noutput="+JSON.stringify r)
Y este un ejemplo de salida:
$ pegcoffee cplusplus.pegjs 
$ coffee use_cplusplus.coffee 
input = 'int (a); int c = int (b);'
output=["decl","decl"]
input = 'int b = 4+2  ;  '
output=["decl"]
input = 'bum = caf = 4-1;
'
output=["stmt"]
input = 'b2 = int(4);'
output=["stmt"]
input = 'int(4);'
output=["stmt"]

Casiano Rodríguez León
2016-03-27