[Gambas-user] Detecting a non-UTF character in a string

Benoît Minisini gambas at ...1...
Sat Mar 29 05:41:33 CET 2014


Le 28/03/2014 16:45, Benoît Minisini a écrit :
> Le 26/03/2014 03:51, bbruen a écrit :
>> I get an occasional* error with downloaded data which results in a
>> postgresql update failure with the following error "ERR: Cannot create
>> record: ERROR:  invalid byte sequence for encoding "UTF8": 0xc2". I know
>> how to fix that as long as I can detect the bad char in the string. That
>> is what I dont know, in fact I haven't got the foggiest clue.
>>
>> The input data is split out from a text file downloaded via FTP. The bad
>> data lines come from a particular source but only occur occasionally.
>> They always occur at the same "place" in the parsed file (i.e. a "Title"
>> field, which is a moderately lengthed string extracted using a Scan()
>> function.) They always appear at the same place in the field (i.e. at
>> the very end of the field.
>>
>> So, how can I test if Right(sTitle,1) = NotARealCharacter?
>>
>> tia
>> Bruce
>>
>
> This is missing in Gambas. A String.IsValid() method that would tell you
> if a string is valid UTF-8.
>
> At the moment, you must validate your string manually.
>
> Or maybe there is a trick:
>
>      Conv$(MyString, "UTF-8", "UCS-4LE")
>
> Conv$() may force a validation of MyString (I'm not sure as this is done
> by the internal iconv GNU library). So if Conv$() raises an error, it
> must mean that MyString is not a valid UTF-8 string.
>
> Regards,
>

Oops. Actually String.IsValid() already exists. :-)

So just can use it!

-- 
Benoît Minisini




More information about the User mailing list