[Gambas-user] Problem with lazy regexp

Tobias Boege taboege at ...626...
Mon Apr 24 09:57:37 CEST 2017


On Sun, 23 Apr 2017, T Lee Davidson wrote:
> According to http://gambaswiki.org/wiki/doc/pcre , using "*?" in a regular
> expression should lazily match 0 or more characters. However, it appears to
> act greedily.
> 
> I am trying to do some very simple HTML tag stripping with
> 'Regex.Replace(sText, "<.*?>", "")', and it takes out way more than just the
> tags.
> 
> Have I misunderstood the documentation?
> 

I believe you are correct. I get the same greedy behaviour from "<.*?>".
The Gambas wiki page seems to be copied from the libpcre documentation [1]
and the point, under QUANTIFIERS:

  *?          0 or more, lazy

hardly gives room for misinterpretation. I just tried the following line:

  RegExp.Replace("<tag abc=\"xyz\">content</tag>", "<.*>", "", RegExp.Ungreedy)

which correctly delivers "content", if you are interested in a workaround.
If no one else does it, I can (try to remember to) try to have a look at
gb.pcre this evening.

Regards,
Tobi

[1] http://www.pcre.org/current/doc/html/pcre2syntax.html

-- 
"There's an old saying: Don't change anything... ever!" -- Mr. Monk




More information about the User mailing list