[Gambas-user] String processing in Wiki

Laurent Carlier lordheavy at ...512...
Tue Mar 28 18:25:24 CEST 2006


Le Mardi 28 Mars 2006 17:57, Eilert a écrit :
> Benoit Minisini schrieb:
> > On Tuesday 28 March 2006 11:44, Eilert wrote:
> >> Today I've translated the functions starting with C into German.
> >>
> >> There are some functions where there is a special hint that they only
> >> deal with ASCII, but Gambas deals with UTF-8.
> >>
> >> Now there are two general questions of mine :-)
> >>
> >> - Why do some of the functions use ASCII and some do not, and are there
> >> plans to make it uniform (all using UTF-8),
> >
> > No.
>
> Ok...
>
> >> or are there other important
> >> reasons for not to do so?
> >
> > Yes.
> >
> > There is a difference between a string to translate (that will be
> > displayed somewhere), and a string that must not be translated, i.e. that
> > is just used by the program logic (a collection key, a english-only
> > syntax...).
>
> I had the case that there was heavy string processing necessary in one
> of my apps, and converting forth and back to be able to use the
> functions was a bit tricky. So there are times when the programmer
> wished that all string processing would be uniform. Sometimes you need
> to use them with strings that originate from UTF-8.
>
> There is one drawback, however. If there is a string with characters
> that are similar to UTF-8 but do not mean UTF-8, the function will have
> to know how to handle the string. An additional flag would be necessary
> to implement this, right? Just like gb.ASCII or gb.UTF-8 with a default
> set to whatever one thinks is used more often?
>
> > UTF-8 strings are heavier to process than ASCII strings, as the size of a
> > character is not necessarily one byte.
>
> And this would slow down those native functions?
>
> >> - If it has to be as it is, shouldn't we add a little hint into every
> >> string function to make clear if it is ASCII or UTF-8? Like a symbol or
> >> so...
> >
> > All native string functions are ASCII-only, and all UTF-8 string
> > functions were put in the String static class.
>
> Yes, I know, I use it a lot. But a little hint for the newbies or people
> like me who tend to forget those tricky things wouldn't be wrong, would it?
>
> > The String class is not complete at the moment, and some of its methods
> > do not have well-chosen names.
>
> But I could use it very well.
>
> >> And a special question:
> >>
> >> - I was wondering why the hint "Be careful! The current localization is
> >> not used by this function." is mentioned for functions like CSng(). What
> >> do such functions have to do with localisation, and why does CShort()
> >> doesn't mention it then?
> >
> > The decimal separator can be different when the language changes.
> >
> > Integer numbers seems to be written the same way in every language.
>
> Aaah yes, of course :-) I see.
>
>

http://de.wikipedia.org/wiki/UTF-8

Regards,

-- 
jabber : lordheavy at ...943...
mail : lordheavymREMOVEME at ...626...





More information about the User mailing list