[Gambas-user] Isn't bracket regular expression compatible with UTF8?

Tobias Boege taboege at ...626...
Wed Jul 5 11:37:59 CEST 2017


On Tue, 04 Jul 2017, Fernando Cabral wrote:
> I have been trying something like *poder[^[:alpha:]*  so I  could find the
> word "poder " ("poder" followed by an space) but not "poderão" ("ã" being
> an alpha character in Portuguese.)
> 
> In English it could be like finding "power" but not "powerless".
> 
> Problem is that it seems [^[alpha]] includes accented characters like "á",
> "é", "ã".
> 
> That is, accented characters are not understood as alpha, but not alpha.
> 
> Please, note that I have compiled it with the UTF8 flag:
> *   re.Compile(poder[^[:alpha]], RegExp.utf8)*
> 
> Any hints?
> 

In your mail I can see three distinct attempts at writing down a
negative character class: [^[:alpha:], [^[alpha]], and [^[:alpha]],
but the correct syntax is

  [[:^alpha:]]

You want to check this first.

Regards,
Tobi

-- 
"There's an old saying: Don't change anything... ever!" -- Mr. Monk




More information about the User mailing list