[Gambas-user] Detecting a non-UTF character in a string

Benoît Minisini gambas at ...1...
Fri Mar 28 16:45:31 CET 2014


Le 26/03/2014 03:51, bbruen a écrit :
> I get an occasional* error with downloaded data which results in a
> postgresql update failure with the following error "ERR: Cannot create
> record: ERROR:  invalid byte sequence for encoding "UTF8": 0xc2". I know
> how to fix that as long as I can detect the bad char in the string. That
> is what I dont know, in fact I haven't got the foggiest clue.
>
> The input data is split out from a text file downloaded via FTP. The bad
> data lines come from a particular source but only occur occasionally.
> They always occur at the same "place" in the parsed file (i.e. a "Title"
> field, which is a moderately lengthed string extracted using a Scan()
> function.) They always appear at the same place in the field (i.e. at
> the very end of the field.
>
> So, how can I test if Right(sTitle,1) = NotARealCharacter?
>
> tia
> Bruce
>

This is missing in Gambas. A String.IsValid() method that would tell you 
if a string is valid UTF-8.

At the moment, you must validate your string manually.

Or maybe there is a trick:

	Conv$(MyString, "UTF-8", "UCS-4LE")

Conv$() may force a validation of MyString (I'm not sure as this is done 
by the internal iconv GNU library). So if Conv$() raises an error, it 
must mean that MyString is not a valid UTF-8 string.

Regards,

-- 
Benoît Minisini




More information about the User mailing list