El objetivo es crear nuestro propio motor de expresiones regulares ([4]).
Para ello, una expresión regular como c(a|d)+r
debe ser expresada mediante una lambda.
Una regexp como c(a|d)+r
es el resultado de la composición de lambdas.
Por ejemplo, la regexp c(a|d)+r
se denotará por:
seq(char('c'), seq(plus(alt(char('a'), char('d'))), char('r')))
La idea es que char
, seq
, plus
, etc. son métodos
que retornan funciones (lambdas). La lambda retornada por char('d')
reconoce
el lenguaje 'd'
, la lambda retornada por star(char('c'))
reconoce
el lenguaje c*
y la lambda retornada por
seq(char('c'), seq(plus(alt(char('a'), char('d'))), char('r')))
reconoce el lenguaje c(a|d)+r
.
Mas en concreto, la lambda que representa la expresión regular r
x
y
false
si no hay un prefijo de x
que case con r
x
si hubo matching
Por ejemplo, el método char
recibe una cadena c
y retorna una lambda que recibe una
cadena x
y
false
si c
no es un prefijo de x
.
char
usa circuito corto:
def char(c) lambda { |x| x[0,c.length] == c and x[c.length..-1] } end
ruby-1.9.2-p290 :001 > load "regexpcalc1213.rb" => true ruby-1.9.2-p290 :002 > include ULL::ETSII::AluXXX::LambdaRegexp => Object ruby-1.9.2-p290 :003 > char('c') => #<Proc:0x007f998b027408@/Users/casiano/Dropbox/src/ruby/rubytesting/TheRubyProgrammingLanguage/Chapter6MethodsProcsLambdasAndClosures/regexp/regexpcalc1213.rb:29 (lambda)> ruby-1.9.2-p290 :004 > char('c')['c'] # 'c' casa con /c/ el resto es vacío "" => "" ruby-1.9.2-p290 :005 > char('c')['d'] # 'd' no casa con /c/. Se retorna false => false ruby-1.9.2-p290 :006 > char('c')['cd'] # El prefijo 'c' casa con 'cd'. Resto = 'd' => "d" ruby-1.9.2-p290 :007 > star(char('c'))['cccef'] # El prefijo 'ccc' casa con /c*/. Resto = 'ef' => "ef" ruby-1.9.2-p290 :008 > star(alt(char('c'), char('e')))['cccef'] # El prefijo 'e' casa con /(c|e)*/. Resto = 'f' => "f" ruby-1.9.2-p290 :009 >
La siguiente secuencia de llamadas:
e = seq(char('c'), char('d')) s = 'cde' remaining = e[s] # 'cde' Matched. Remaining = 'e' puts "'#{s}' Matched. Remaining = '#{remaining}'" if remaining e = seq(char('c'), alt(star(char('d')), char('r'))) s = 'cdddf' # remaining = e[s] # 'cdddf' Matched. Remaining = 'f' puts "'#{s}' Matched. Remaining = '#{remaining}'" if remaining s = 'crddf' # 'crddf' Matched. Remaining = 'rddf' remaining = e[s] puts "'#{s}' Matched. Remaining = '#{remaining}'" if remainingDebería producir una salida parecida a esta:
MacBookdeCasiano:rubytesting casiano$ ruby regexpcalc.rb 'cde' Matched. Remaining = 'e' 'cdddf' Matched. Remaining = 'f' 'crddf' Matched. Remaining = 'rddf'
Es posible usar overriding de operadores para mejorar la expresividad de la notación. He aqui un ejemplo que utiliza una versión en la que se han definido los operadores:
-
es seq
|
es alt
+
unario es plus
~
es star
re
es una versión de char
e = 'cd'.re s = 'cde' remaining = e[s] # 'cde' Matched. Remaining = 'e' puts "/cd/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining e = 'd'.re | 'r'.re s = 'rdddf' remaining = e[s] # 'rdddf' Matched. Remaining = 'dddf' puts "/d|r/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining e = 'c'.re - ('d'.re | 'r'.re) s = 'crddf' remaining = e[s] # 'crddf' Matched. Remaining = 'ddf' puts "/c(d|r)/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining e = 'c'.re - +('d'.re | 'r'.re) s = 'crddf' remaining = e[s] puts "/c(d|r)+/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining e = 'c'.re - ~('d'.re | 'r'.re) s = 'cdrdrf' remaining = e[s] puts "/c(d|r)*/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining e = 'c'.re - ~('d'.re | 'r'.re) s = 'cfff' remaining = e[s] puts "/c(d|r)*/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining e = 'c'.re - ~('d'.re | 'r'.re) s = 'cdrd' remaining = e[s] puts "/c(d|r)*/ '#{s}' Matched. Remaining = '#{remaining}'" if remaining e = 'c'.re - +('d'.re | 'r'.re) s = 'cfff' remaining = e[s] # 'cfff' didn't match. Remaining = 'false' puts "/c(d|r)+/ '#{s}' didn't match. Remaining = '#{remaining}'" unless remainingCuando se ejecuta, este programa produce:
~/rubytesting$ ruby regexpcalc2.rb /cd/ 'cde' Matched. Remaining = 'e' /d|r/ 'rdddf' Matched. Remaining = 'dddf' /c(d|r)/ 'crddf' Matched. Remaining = 'ddf' /c(d|r)+/ 'crddf' Matched. Remaining = 'f' /c(d|r)*/ 'cdrdrf' Matched. Remaining = 'f' /c(d|r)*/ 'cfff' Matched. Remaining = 'fff' /c(d|r)*/ 'cdrd' Matched. Remaining = '' /c(d|r)+/ 'cfff' didn't match. Remaining = 'false'
module_function
para
que el espacio de nombres sea incluíble
(véase la sección 14.7.4 de los apuntes):
module ULL module ETSII module AluXXX module LambdaRegexp module_function def epsilon lambda {|x| x } end .... end # Lambda end # AluXXX end # ETSII end # ULL
ull-etsii-aluXX-lambdaregexp