El objetivo es crear nuestro propio motor de expresiones regulares ([4]).
Para ello, una expresión regular como c(a|d)+r
debe ser expresada mediante una lambda.
Una regexp como c(a|d)+r es el resultado de la composición de lambdas.
Por ejemplo, la regexp c(a|d)+r se denotará por:
seq(char('c'), seq(plus(alt(char('a'), char('d'))), char('r')))
La idea es que char, seq, plus, etc. son métodos
que retornan funciones (lambdas). La lambda retornada por char('d') reconoce
el lenguaje 'd', la lambda retornada por star(char('c')) reconoce
el lenguaje c* y la lambda retornada por
seq(char('c'), seq(plus(alt(char('a'), char('d'))), char('r')))
reconoce el lenguaje c(a|d)+r.
Mas en concreto, la lambda que representa la expresión regular r
x y
false si no hay un prefijo de x que case con r
x si hubo matching
Por ejemplo, el método char recibe una cadena c
y retorna una lambda que recibe una
cadena x y
false si c no es un prefijo de x.
char usa circuito corto:
def char(c)
lambda { |x| x[0,c.length] == c and x[c.length..-1] }
end
ruby-1.9.2-p290 :001 > load "regexpcalc1213.rb"
=> true
ruby-1.9.2-p290 :002 > include ULL::ETSII::AluXXX::LambdaRegexp
=> Object
ruby-1.9.2-p290 :003 > char('c')
=> #<Proc:0x007f998b027408@/Users/casiano/Dropbox/src/ruby/rubytesting/TheRubyProgrammingLanguage/Chapter6MethodsProcsLambdasAndClosures/regexp/regexpcalc1213.rb:29 (lambda)>
ruby-1.9.2-p290 :004 > char('c')['c'] # 'c' casa con /c/ el resto es vacío ""
=> ""
ruby-1.9.2-p290 :005 > char('c')['d'] # 'd' no casa con /c/. Se retorna false
=> false
ruby-1.9.2-p290 :006 > char('c')['cd'] # El prefijo 'c' casa con 'cd'. Resto = 'd'
=> "d"
ruby-1.9.2-p290 :007 > star(char('c'))['cccef'] # El prefijo 'ccc' casa con /c*/. Resto = 'ef'
=> "ef"
ruby-1.9.2-p290 :008 > star(alt(char('c'), char('e')))['cccef'] # El prefijo 'e' casa con /(c|e)*/. Resto = 'f'
=> "f"
ruby-1.9.2-p290 :009 >
La siguiente secuencia de llamadas:
e = seq(char('c'), char('d'))
s = 'cde'
remaining = e[s] # 'cde' Matched. Remaining = 'e'
puts "'#{s}' Matched. Remaining = '#{remaining}'" if remaining
e = seq(char('c'), alt(star(char('d')), char('r')))
s = 'cdddf' #
remaining = e[s] # 'cdddf' Matched. Remaining = 'f'
puts "'#{s}' Matched. Remaining = '#{remaining}'" if remaining
s = 'crddf' # 'crddf' Matched. Remaining = 'rddf'
remaining = e[s]
puts "'#{s}' Matched. Remaining = '#{remaining}'" if remaining
Debería producir una salida parecida a esta:
MacBookdeCasiano:rubytesting casiano$ ruby regexpcalc.rb 'cde' Matched. Remaining = 'e' 'cdddf' Matched. Remaining = 'f' 'crddf' Matched. Remaining = 'rddf'
Es posible usar overriding de operadores para mejorar la expresividad de la notación. He aqui un ejemplo que utiliza una versión en la que se han definido los operadores:
- es seq
| es alt
+ unario es plus
~ es star
re es una versión de char
e = 'cd'.re
s = 'cde'
remaining = e[s] # 'cde' Matched. Remaining = 'e'
puts "/cd/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining
e = 'd'.re | 'r'.re
s = 'rdddf'
remaining = e[s] # 'rdddf' Matched. Remaining = 'dddf'
puts "/d|r/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining
e = 'c'.re - ('d'.re | 'r'.re)
s = 'crddf'
remaining = e[s] # 'crddf' Matched. Remaining = 'ddf'
puts "/c(d|r)/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining
e = 'c'.re - +('d'.re | 'r'.re)
s = 'crddf'
remaining = e[s]
puts "/c(d|r)+/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining
e = 'c'.re - ~('d'.re | 'r'.re)
s = 'cdrdrf'
remaining = e[s]
puts "/c(d|r)*/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining
e = 'c'.re - ~('d'.re | 'r'.re)
s = 'cfff'
remaining = e[s]
puts "/c(d|r)*/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining
e = 'c'.re - ~('d'.re | 'r'.re)
s = 'cdrd'
remaining = e[s]
puts "/c(d|r)*/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining
e = 'c'.re - +('d'.re | 'r'.re)
s = 'cfff'
remaining = e[s] # 'cfff' didn't match. Remaining = 'false'
puts "/c(d|r)+/ '#{s}' didn't match. Remaining = '#{remaining}'" unless remaining
Cuando se ejecuta, este programa produce:
~/rubytesting$ ruby regexpcalc2.rb /cd/ 'cde' Matched. Remaining = 'e' /d|r/ 'rdddf' Matched. Remaining = 'dddf' /c(d|r)/ 'crddf' Matched. Remaining = 'ddf' /c(d|r)+/ 'crddf' Matched. Remaining = 'f' /c(d|r)*/ 'cdrdrf' Matched. Remaining = 'f' /c(d|r)*/ 'cfff' Matched. Remaining = 'fff' /c(d|r)*/ 'cdrd' Matched. Remaining = '' /c(d|r)+/ 'cfff' didn't match. Remaining = 'false'
module_function para
que el espacio de nombres sea incluíble
(véase la sección 14.7.4 de los apuntes):
module ULL
module ETSII
module AluXXX
module LambdaRegexp
module_function
def epsilon
lambda {|x| x }
end
....
end # Lambda
end # AluXXX
end # ETSII
end # ULL
ull-etsii-aluXX-lambdaregexp