[Gambas-user] Regular expressions

Tobias Boege tobs at taboege.de
Thu Dec 30 20:38:39 CET 2021


On Thu, 30 Dec 2021, T Lee Davidson wrote:
> On 12/30/21 07:37, Hans Lehmann wrote:
> > Hello.
> > 
> > I am looking for 3 regular pattern expressions that check in a DokuWiki
> > text whether the formats `underlined`, `italic` or `bold` are in the
> > given text.
> > 
> > Example text:
> > 
> > The //installation// of a **SSH server** on the remote __computer__ is worthwhile in any case!
> > 
> > My many attempts ended up with these patterns, which unfortunately do not work:
> > 
> > IF sLine Like "*[_]{2}*[_]{2}*" THEN sLine = Replace(sLine, "__", "<uuu>")
> > IF sLine Like "*[*]{2}*[*]{2}*" THEN sLine = Replace(sLine, "**", "<bbb>")
> > IF sLine Like "*[/]{2}*[/]{2}*" THEN sLine = Replace(sLine, "//", "<iii>")
> > 
> > Any hint in the right direction will be gladly read.
> > 
> > With kind regards
> > 
> > Hans
> 
> DokuWiki stores all its data in UTF-8.[1] This is almost certainly the
> reason it is not working since LIKE deals only with ASCII strings.[2]
> 
> You should use RegExp.Replace (gb.pcre) [3] as LIKE is not a valid solution for this particular scenario.
> 

In addition, the LIKE patterns quoted above fall into a very common trap
with regular expressions (or similar patterns): if you match against

  [_]{2}(.*)[_]{2}

(a straightforward translation of the given pattern to PCRE) as suggested
in another email, then problems will arise if there are more than one
underline markups in your string because the `.*` in the middle is by
default "greedy". The line

  The __installation__ of a __SSH server__ on the remote __computer__ is worthwhile in any case!

will get everything from the first __ to the last __ on the line replaced.
This spans three different markups which are left unchanged!

In gb.pcre, you would use the "frugal" quantifier `*?` instead of `*`:

  [_]{2}(.*?)[_]{2}

Since Gambas 3.5, you can use the convenient RegExp.Replace() function,
which compiles quantifiers frugally by default. See the documentation for
more information! Here is the solution, also incorporating the hint by
T Lee about UTF8 (I don't know if that flag is the default in gb.pcre):

  RegExp.Replace(sDokuWiki, "[_]{2}(.*)[_]{2}", "<uuu>&1<uuu>", gb.UTF8)

Best,
Tobias

-- 
"There's an old saying: Don't change anything... ever!" -- Mr. Monk


More information about the User mailing list