[Gambas-user] problems in trie class

Tobias Boege taboege at ...626...
Mon Dec 1 17:47:54 CET 2014


On Sat, 29 Nov 2014, Charlie Reinl wrote:
> Am Samstag, den 29.11.2014, 20:05 +0100 schrieb Tobias Boege:
> > On Tue, 18 Nov 2014, Karl Reinl wrote:
> > > Salut Tobi,
> > > 
> > > played with you trie example (trietest) it crash if 
> > > p = h.GetPrefix("texte") find nothing (p=null), even when change
> > > to p = h.GetPrefix("Texte")
> > > 
> > > My change is h["texte"] to h["Texte"] (source attached)
> > 
> > Can you run your tests with #6688?
> > 
> > Indeed there were bit width errors (I think) but what caused your particular
> > error here was that a TriePrefix object would erroneously drop reference
> > counts of its parent Trie object if it couldn't be created. Total nonsense.
> > 
> > Thanks for your report!
> > 
> > As for the leak you showed in a follow-up, I couldn't reproduce that. I'll
> > try harder if the problem persists with #6688. But I can tell you that this
> > does not necessarily indicate a severe problem / corruption, as I don't
> > terminate strings in my trie backend code and it could just be some length
> > calculations that went wrong. (Most probably that is because the Gambas
> > string functions automatically use strlen() to determine a string's length
> > when I give 0 as a length parameter. However, strlen() must not be used on
> > the strings from my trie.)
> > 
> > Regards,
> > Tobi
> > 
> 
> Salut Tobi,
> 
> yes, now no more crash, only an error raises, thats oK.
> The leak shown, I can't reproduce any more now..... BUT
> Now TriePrefix is case sensitive, in my follow-up the TriePrefix "d"
> showed me "D" AND "d"entries, now only the "d", may be thats how trie
> work normally,
>

It should be case-sensitive. If it wasn't before on your system, that was
a bug (I can't imagine where it came from, though).

> but for my behaves non case sensitive would be better (or
> a switch to do like that) 
> And we talked about something like <trie>.Add(Value,Key) for simplifing
> filling.
> 

OK, you get an Add() and Remove() method in #6699, similar to what
Collection has.

As for the case-insensitivity: I can add an optional constructor argument,
Mode, which can be gb.Binary or gb.IgnoreCase (just as Collection has).

(After writing about half an hour complaining how hard it would be to get
case-insensitivity right and efficient) I just had a magnificent idea: I
will extend the native Trie class in Gambas and do something like that:

  ' Written from scratch, may contain syntax, etc. errors

  Public Struct _Trie_Entry
    Key As String
    Value As Variant
  End Struct

  Public Sub Add(Value As Variant, Key As String)
    Dim hEntry As New _Trie_Entry
    Dim sKey As String = Key

    If $iMode = gb.IgnoreCase Then sKey = String.Upper(Key)
    hEntry.Key = Key
    hEntry.Value = Value
    Super.Add(Value, sKey)
  End

Similarly I can augment _get, _put, etc. so that you won't notice that the
_Trie_Entry structure exists at all.

If you request a case-insensitive Trie, all keys are upper-case'd internally
and the real keys are saved as part of the stored object. So you can get
your original key back later in an enumeration and I don't have to add
branches in the hot paths of the trie code to support case-insensitivity
(which would make the whole thing slower -- I don't know if it would be
noticeable, but...). I will see if I can do that (_next() may impose a
little problem or maybe not).

@Benoit: I am not familiar with the caveats of UTF-8 strings. If I want to
implement case-insensitivity by internally converting all characters to
upper-case, is it sufficient to use String.Upper() or are there hidden
pitfalls?

Regards,
Tobi

-- 
"There's an old saying: Don't change anything... ever!" -- Mr. Monk




More information about the User mailing list