[Gambas-user] Boxed string question
Bruce
adamnt42 at gmail.com
Thu Sep 12 00:44:51 CEST 2019
"Difficult" more like d*** near impossible even in english. From our
horse days here's a list of cases from our old "ProperName" function
(not Capitalize as it was designed for names of people and horses, I
would think Capitalize would be a generic lower level function to
replace the LH char of a single word string).
' Copes with normal multiword names: James Brown, BLACK CAVIAR etc
' Copes with words within parentheses: James BROWN (Snr)
' Copes with hypenation, suffixes and titles: "Mrs Daisy-May
Polkinghorne-Throckwhistle III"
' Copes with apostrophised names: O'BRIEN (O'Brien), d'AMICO (d'Amico)
' Copes with apostrophised abbreviations: BLACK N' TAN
' Copes with the MacDonalds and McDonalds
' Copes with handling non-scots like the Macedons
' Can handle DeNiro (in our convention this must be two words!)
' Can handle subparts VAN DER GRAF (by convention -> VanDer Graf)
Normalisation is even harder. Given the model:
Data Aquisition -> Data Storage -> Data Display
we found
a) humans are incredibly inventive when coming up with new ways to enter
the same name
b) partnerships and company names are very often entered into the name
field when you are expecting a persons name
c) "Normalization" for data storage requires the creation of your own
"conventions" for e.g we found that using "Vander Graf" suited us better
because due to our data population having all the "Van"'s, "Vander"'s,
"Von"'s etc sorted like that made manual error detection a bit easier.
d) (The big one!) When displaying the stored values it was better to
"semi-denormalize" them as it was easier to spot errors. For example,
once "Johannes and Maria van der Graf" had been normalised to "Vander
Graf, Johannes&Maria" we de-normalised for display to "Van der Graf,
Johannes and Maria".
We did try at one time to design a "rule based" approach to this but it
never got very far.
So best of luck to those attempting to develop their own approach to
handling name capitalization, normalisation and display.
regards
b
On 12/9/19 1:48 am, Benoît Minisini wrote:
> Le 11/09/2019 à 15:44, Gianluigi a écrit :
>> Hi Cedron,
>> thank you very much for all the suggestions ;-)
>>
>> Regards
>> Gianluigi
>>
>
> Why the name "ProperCase"?
>
> Anyway, it's difficult to add a function that should behave differently
> according to the current language (the proper case is not the same thing
> in French, Italian, and I don't tell about Chinese and Arabic), if that
> information is not available in the libc.
>
> The right name would be "String.CapitalizeWords".
>
> And String.UCaseFirst() should have been named String.Capitalize().
>
> As for Cedric's suggestion, ('HomeAddress' <==> 'home_address'), it a
> good idea too. The generic name for that sort of things is "normalization".
>
> But if you normalize for database you should put the function in the
> database component. Which has, by the way, quoting functions that should
> allow to name your identifiers as you want.
>
> Regards,
>
More information about the User
mailing list