[Gambas-user] Problem with lazy regexp
Tobias Boege
taboege at ...626...
Mon Apr 24 10:25:45 CEST 2017
On Mon, 24 Apr 2017, Tobias Boege wrote:
> On Sun, 23 Apr 2017, T Lee Davidson wrote:
> > According to http://gambaswiki.org/wiki/doc/pcre , using "*?" in a regular
> > expression should lazily match 0 or more characters. However, it appears to
> > act greedily.
> >
> > I am trying to do some very simple HTML tag stripping with
> > 'Regex.Replace(sText, "<.*?>", "")', and it takes out way more than just the
> > tags.
> >
> > Have I misunderstood the documentation?
> >
>
> I believe you are correct. I get the same greedy behaviour from "<.*?>".
> The Gambas wiki page seems to be copied from the libpcre documentation [1]
> and the point, under QUANTIFIERS:
>
> *? 0 or more, lazy
>
> hardly gives room for misinterpretation. I just tried the following line:
>
> RegExp.Replace("<tag abc=\"xyz\">content</tag>", "<.*>", "", RegExp.Ungreedy)
>
> which correctly delivers "content", if you are interested in a workaround.
> If no one else does it, I can (try to remember to) try to have a look at
> gb.pcre this evening.
>
It's still before noon, but I saw that the RegExp.Replace() routine always
automatically adds the RegExp.Ungreedy flag to the regular expression. With
that in mind, I tried
RegExp.Replace(sText, "<.*>", "")
and it worked ungreedily. In fact, since the compilation options are always
OR'd, my successful pattern above with RegExp.Ungreedy was just an accident
and the setting of RegExp.Ungreedy was redundant. The PCRE documentation [1]
mentions a fact that escapes the Gambas documentation [2]:
PCRE2_UNGREEDY Invert greediness of quantifiers
(the Gambas documentation reads like it makes everything ungreedy.)
So, the greediness you get is explained, I'll add some bits to the
documentation later. Basically, RegExp.Replace() is always ungreedy.
You can still get greedy quantifiers by using ungreedy ones in your
pattern...
Regards,
Tobi
[1] http://www.pcre.org/current/doc/html/pcre2_compile.html
[2] http://gambaswiki.org/wiki/comp/gb.pcre/regexp/ungreedy
--
"There's an old saying: Don't change anything... ever!" -- Mr. Monk
More information about the User
mailing list