Gambas and UTF-8 (Was Re: [Gambas-user] a bit of confusion)
Benoit Minisini
gambas at ...1...
Thu Dec 23 10:37:32 CET 2004
On Thursday 23 December 2004 00:00, Stefan Lamprecht wrote:
> i spend days finding out how to operate correctly with utf-8 strings.
> the information regarding this was close to zero (0) and more misleading in
> some cases.
> i suggest some kind of 'elegant' programming contest to gather some more
> and complete samples.
>
> feed the gambas
>
Gambas uses ASCII internally, but uses UTF-8 for strings that should be
displayed, and for file names inside components.
For those who don't know what are UTF-8 and Unicode, you can go there:
http://www.cl.cam.ac.uk/~mgk25/unicode.html
Qt uses Unicode internally, so every string sent to and got from Qt are
converted to UTF-8. Note that when you type a string in the IDE, you use Qt
indirectly, and so all what you type is UTF-8. Create a text file with the
IDE, enter non-ASCII characters, save it, and open it with KWrite, and you
will see what I mean.
When I say that Gambas uses ASCII internally, I want to mean that EVERY native
Gambas string functions deals with ASCII: Left$(), Mid$(), Right$(), Instr(),
Len(), ...
When you want to deal with UTF-8 strings, you have a class named "String" with
many static methods that process UTF-8 strings instead of ASCII ones: for
example, Len("é") = 2 and String.Len("é") = 1. Read the wiki for more
information about each method.
Gambas has a conversion function named... Conv$() that can convert any string
charset to any other string charset (or almost). For example, Conv$("àéî",
"UTF-8", "ISO8859-1").
If the Qt component returns UTF-8 strings, the output of shell commands uses
the charset of the system. To deal with that problem, you must use Conv$()
and the two following class properties: System.Charset, that returns the
charset used by the system, and Desktop.Charset, that returns the charset
used by the GUI component.
On Fedora, Desktop.Charset = System.Charset = "UTF-8", but not on Mandrake
where System.Charset depends on your system language.
When all Linux systems become UTF-8 based, things will be simpler.
I hope things are clearer. Do not hesitate to ask questions about that.
Regards,
--
Benoit Minisini
mailto:gambas at ...1...
More information about the User
mailing list