[Gambas-user] Regex - expert opinion requested

Fernando Cabral fernandojosecabral at ...626...
Wed May 31 23:30:20 CEST 2017


This is only for those who like to work with regular expressions.
It is a performance issue. I am using 26 different regular expressions of
this kind:

txt = RegExp.Replace(TextoBruto, NaoNumerais, "&1\n", RegExp.UTF8)
txt = RegExp.Replace(Txt, "\n\n+?", "\n", RegExp.UTF8)
txt = RegExp.Replace(Txt, "^\n+?", "", RegExp.UTF8)
txt = RegExp.Replace(Txt, "\n+?$", "", RegExp.UTF8)

Those are pretty fast. Less than one second for a text with 415KB (about
six thousand lines).

But the following code is quite slow. About 27 seconds each:

ttDigitos = String.Len(RegExp.Replace(TextoBruto, "[^0-9]", "",
RegExp.UTF8)) ' 27 segundos
ttPontuacao = String.Len(RegExp.Replace(TextoBruto, "[^.:;,?!]", "",
RegExp.UTF8))  ' 27 segundos
ttBrancos = String.Len(RegExp.Replace(TextoBruto, "[^ \t]", "",
RegExp.UTF8))   ' 27 segundos
Print "Especial antigo", Now
'ttEspeciais = String.Len(RegExp.Replace(TextoBruto,
"[^-[\\](){}\"@#$%&*_+=<>/\\\\|ºª§“”‘’]", "", RegExp.UTF8))  ' 27 segundos
Print "Especial novo", Now
ttEspeciais = String.Len(RegExp.Replace(TextoBruto,
"[-aeiouãáéíóúâõàbcçdfghjlmnpqrstvxyz
,.:;!?()0-9êôwkèìòùäÄÁÉÍÓÚÀÈÌÒÙÂÔÂÊÔÇABCDEFGHIJKLMNOPQRSTUVWXYZ]", "",
RegExp.UTF8))  ' 27 segundos
Print "fim especial novo", Now

Quite slow. The whole programm takes 2 minutes to run. The above lines
alone consume 108 seconds (108:120).

I tried some variations. For instance, ttEspeciais = .... has two versions.
One negates what to leave in, the other describes what to take out. End
result is the same. And so is the time spent.

I have also written a much longer code that does the same thing using loops
and searching for the characters I want in or want out. The whole thing
runs in about 5 seconds (but this code took me much, much longer do write).

I wonder if any of you could suggest potentially faster RegExp that could
replace the specimens above.

Regard

- fernando
-- 
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: fernandojosecabral at ...626...
Facebook: f at ...3654...
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype:  fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.



More information about the User mailing list