[Gambas-user] Problem with lazy regexp
Tobias Boege
taboege at ...626...
Mon Apr 24 09:57:37 CEST 2017
On Sun, 23 Apr 2017, T Lee Davidson wrote:
> According to http://gambaswiki.org/wiki/doc/pcre , using "*?" in a regular
> expression should lazily match 0 or more characters. However, it appears to
> act greedily.
>
> I am trying to do some very simple HTML tag stripping with
> 'Regex.Replace(sText, "<.*?>", "")', and it takes out way more than just the
> tags.
>
> Have I misunderstood the documentation?
>
I believe you are correct. I get the same greedy behaviour from "<.*?>".
The Gambas wiki page seems to be copied from the libpcre documentation [1]
and the point, under QUANTIFIERS:
*? 0 or more, lazy
hardly gives room for misinterpretation. I just tried the following line:
RegExp.Replace("<tag abc=\"xyz\">content</tag>", "<.*>", "", RegExp.Ungreedy)
which correctly delivers "content", if you are interested in a workaround.
If no one else does it, I can (try to remember to) try to have a look at
gb.pcre this evening.
Regards,
Tobi
[1] http://www.pcre.org/current/doc/html/pcre2syntax.html
--
"There's an old saying: Don't change anything... ever!" -- Mr. Monk
More information about the User
mailing list