[Gambas-user] problems in trie class
Tobias Boege
taboege at ...626...
Mon Dec 1 18:28:44 CET 2014
On Mon, 01 Dec 2014, Beno?t Minisini wrote:
> > (After writing about half an hour complaining how hard it would be to get
> > case-insensitivity right and efficient) I just had a magnificent idea: I
> > will extend the native Trie class in Gambas and do something like that:
> >
> > ' Written from scratch, may contain syntax, etc. errors
> >
> > Public Struct _Trie_Entry
> > Key As String
> > Value As Variant
> > End Struct
> >
> > Public Sub Add(Value As Variant, Key As String)
> > Dim hEntry As New _Trie_Entry
> > Dim sKey As String = Key
> >
> > If $iMode = gb.IgnoreCase Then sKey = String.Upper(Key)
> > hEntry.Key = Key
> > hEntry.Value = Value
> > Super.Add(Value, sKey)
> > End
> >
> > Similarly I can augment _get, _put, etc. so that you won't notice that the
> > _Trie_Entry structure exists at all.
> >
> > If you request a case-insensitive Trie, all keys are upper-case'd internally
> > and the real keys are saved as part of the stored object. So you can get
> > your original key back later in an enumeration and I don't have to add
> > branches in the hot paths of the trie code to support case-insensitivity
> > (which would make the whole thing slower -- I don't know if it would be
> > noticeable, but...). I will see if I can do that (_next() may impose a
> > little problem or maybe not).
> >
> > @Benoit: I am not familiar with the caveats of UTF-8 strings. If I want to
> > implement case-insensitivity by internally converting all characters to
> > upper-case, is it sufficient to use String.Upper() or are there hidden
> > pitfalls?
> >
> > Regards,
> > Tobi
> >
>
> If they are keys, they should be ASCII, not UTF-8, like with the
> Collection class.
>
Currently the Trie supports arbitrary sequences of bytes as keys. It would
take some time to move the code back in that restricts the key space to
ASCII sequences. Consequently I'd like to leave it as-is.
> If case insensitive is really needed, you should really manage that
> inside the C code of the component.
>
Why? With the above method I use some more space but avoid lots of if's in
the more frequently used parts of the C code. Do you see any performance
penalty I'm overlooking?
Regards,
Tobi
--
"There's an old saying: Don't change anything... ever!" -- Mr. Monk
More information about the User
mailing list