[Gambas-user] Regex - expert opinion requested
Fernando Cabral
fernandojosecabral at ...626...
Wed May 31 23:30:20 CEST 2017
This is only for those who like to work with regular expressions.
It is a performance issue. I am using 26 different regular expressions of
this kind:
txt = RegExp.Replace(TextoBruto, NaoNumerais, "&1\n", RegExp.UTF8)
txt = RegExp.Replace(Txt, "\n\n+?", "\n", RegExp.UTF8)
txt = RegExp.Replace(Txt, "^\n+?", "", RegExp.UTF8)
txt = RegExp.Replace(Txt, "\n+?$", "", RegExp.UTF8)
Those are pretty fast. Less than one second for a text with 415KB (about
six thousand lines).
But the following code is quite slow. About 27 seconds each:
ttDigitos = String.Len(RegExp.Replace(TextoBruto, "[^0-9]", "",
RegExp.UTF8)) ' 27 segundos
ttPontuacao = String.Len(RegExp.Replace(TextoBruto, "[^.:;,?!]", "",
RegExp.UTF8)) ' 27 segundos
ttBrancos = String.Len(RegExp.Replace(TextoBruto, "[^ \t]", "",
RegExp.UTF8)) ' 27 segundos
Print "Especial antigo", Now
'ttEspeciais = String.Len(RegExp.Replace(TextoBruto,
"[^-[\\](){}\"@#$%&*_+=<>/\\\\|ºª§“”‘’]", "", RegExp.UTF8)) ' 27 segundos
Print "Especial novo", Now
ttEspeciais = String.Len(RegExp.Replace(TextoBruto,
"[-aeiouãáéíóúâõàbcçdfghjlmnpqrstvxyz
,.:;!?()0-9êôwkèìòùäÄÁÉÍÓÚÀÈÌÒÙÂÔÂÊÔÇABCDEFGHIJKLMNOPQRSTUVWXYZ]", "",
RegExp.UTF8)) ' 27 segundos
Print "fim especial novo", Now
Quite slow. The whole programm takes 2 minutes to run. The above lines
alone consume 108 seconds (108:120).
I tried some variations. For instance, ttEspeciais = .... has two versions.
One negates what to leave in, the other describes what to take out. End
result is the same. And so is the time spent.
I have also written a much longer code that does the same thing using loops
and searching for the characters I want in or want out. The whole thing
runs in about 5 seconds (but this code took me much, much longer do write).
I wonder if any of you could suggest potentially faster RegExp that could
replace the specimens above.
Regard
- fernando
--
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: fernandojosecabral at ...626...
Facebook: f at ...3654...
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype: fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868
Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.
More information about the User
mailing list