[Gambas-user] String processing in Wiki

Tue Mar 28 18:17:22 CEST 2006

On Tuesday 28 March 2006 17:57, Eilert wrote:
> Benoit Minisini schrieb:
> > On Tuesday 28 March 2006 11:44, Eilert wrote:
> >> Today I've translated the functions starting with C into German.
> >>
> >> There are some functions where there is a special hint that they only
> >> deal with ASCII, but Gambas deals with UTF-8.
> >>
> >> Now there are two general questions of mine :-)
> >>
> >> - Why do some of the functions use ASCII and some do not, and are there
> >> plans to make it uniform (all using UTF-8),
> >
> > No.
>
> Ok...
>
> >> or are there other important
> >> reasons for not to do so?
> >
> > Yes.
> >
> > There is a difference between a string to translate (that will be
> > displayed somewhere), and a string that must not be translated, i.e. that
> > is just used by the program logic (a collection key, a english-only
> > syntax...).
>
> I had the case that there was heavy string processing necessary in one
> of my apps, and converting forth and back to be able to use the
> functions was a bit tricky. So there are times when the programmer
> wished that all string processing would be uniform. Sometimes you need
> to use them with strings that originate from UTF-8.

An ASCII string is a valid UTF-8 string. And nothing prevents you from using 
native ASCII string functions with UTF-8 strings, provided that you know what 
you do. There is no need to "convert" a UTF-8 string to ASCII!

>
> There is one drawback, however. If there is a string with characters
> that are similar to UTF-8 but do not mean UTF-8, the function will have
> to know how to handle the string. An additional flag would be necessary
> to implement this, right? Just like gb.ASCII or gb.UTF-8 with a default
> set to whatever one thinks is used more often?
>

What are you talking about? Are you sure that you know exactly how UTF-8 
works? Please give details about what you want to do exactly...

> > UTF-8 strings are heavier to process than ASCII strings, as the size of a
> > character is not necessarily one byte.
>
> And this would slow down those native functions?

Of course!

>
> >> - If it has to be as it is, shouldn't we add a little hint into every
> >> string function to make clear if it is ASCII or UTF-8? Like a symbol or
> >> so...
> >
> > All native string functions are ASCII-only, and all UTF-8 string
> > functions were put in the String static class.
>
> Yes, I know, I use it a lot. But a little hint for the newbies or people
> like me who tend to forget those tricky things wouldn't be wrong, would it?
>

Yes. I admit this is not completely clear :-) What is missing in the 
documentation is a set of overviews.

> > The String class is not complete at the moment, and some of its methods
> > do not have well-chosen names.
>
> But I could use it very well.
>
> >> And a special question:
> >>
> >> - I was wondering why the hint "Be careful! The current localization is
> >> not used by this function." is mentioned for functions like CSng(). What
> >> do such functions have to do with localisation, and why does CShort()
> >> doesn't mention it then?
> >
> > The decimal separator can be different when the language changes.
> >
> > Integer numbers seems to be written the same way in every language.
>
> Aaah yes, of course :-) I see.
>
>
> Rolf
>

Regards,

-- 
Benoit Minisini