[Gambas-user] Sorting the human way

Rolf-Werner Eilert eilert-sprachen at ...221...
Wed Oct 23 09:58:27 CEST 2013



Am 23.10.2013 09:20, schrieb Alain Baudrez:
> Hoi,
>
> I've never been happy with the standard sorting algorithm when dealing with
> lists of names. The human eye expects the names to be listed
> alphabetically, overlooking spaces, hyphens, accented characters, ...
>
> Assume the following names:
>       - Benoizy
>       - Benoît
>       - Benï Lewis
>       - Benoix
>       - Ben Underwood
>
> Sorting them using the default sort method results in the following list:
>       - Ben Underwood
>       - Benoix
>       - Benoizy
>       - Benoît
>       - Benï Lewis
>
>         Ben Underwood comes first due to the space having an UTF-8 value of
>  
>         Benoît comes last in the Benoi series of names due to the UTF-8
> value of î which is ï
>         Benï Lewis comes last of the list as ï has a UTF-8 value of î
>
> Using the function Alfabet to sort the list, the end result using the
> original strings appears in the form :
>       - Benï Lewis
>       - Benoît
>       - Benoix
>       - Benoizy
>       - Ben Underwood
>
>         which is the normal order a human expect to see when you ignore
> spaces, accents, umlauts, ...
>
> In annex I sent a function I have written that strips a string from all the
> non-letter characters and returns a simple pure ascii string with all
> characters in the range "a"-"z". Also included a small snapshot of an
> actual list sorted by alphabet in one of my programs. As you notice, the
> names are truly listed 'by Alphabet'
>
> Feel free to use the function or maybe the concept could be incorporated in
> a future build of Gambas
>
> Alain J. Baudrez
>

Hi Alain,

Thanks for that interesting approach. I did a similar thing some years 
ago to sort lists of (mainly German) names. But it is not all that easy 
in every country.

You have to know that our Umlauts are sorted like vocal + "e", i. e. "ä" 
= "ae" (which is its historical representation). And in office files (I 
mean the paper ones) or telephone registers, we use tabs with "St" and 
"Sch".

But beware: not all folks are doing it that way. The Swedish for 
instance handle the umlauts as separate letters, i. e. they appear at 
the end of the list. So in a Swedish dictionary, you will find a word 
starting with an "ä" behind the words with "z". (And I would expect 
"Hägar" to appear behind "Hazufel".)

My own algorithm sorts strictly the German way, whereas it does not 
collect "St" and "Sch".

Rolf




More information about the User mailing list