[Gambas-user] problems in trie class

Benoît Minisini gambas at ...1...
Mon Dec 1 18:07:05 CET 2014


Le 01/12/2014 17:47, Tobias Boege a écrit :
> On Sat, 29 Nov 2014, Charlie Reinl wrote:
>> Am Samstag, den 29.11.2014, 20:05 +0100 schrieb Tobias Boege:
>>> On Tue, 18 Nov 2014, Karl Reinl wrote:
>>>> Salut Tobi,
>>>>
>>>> played with you trie example (trietest) it crash if
>>>> p = h.GetPrefix("texte") find nothing (p=null), even when change
>>>> to p = h.GetPrefix("Texte")
>>>>
>>>> My change is h["texte"] to h["Texte"] (source attached)
>>>
>>> Can you run your tests with #6688?
>>>
>>> Indeed there were bit width errors (I think) but what caused your particular
>>> error here was that a TriePrefix object would erroneously drop reference
>>> counts of its parent Trie object if it couldn't be created. Total nonsense.
>>>
>>> Thanks for your report!
>>>
>>> As for the leak you showed in a follow-up, I couldn't reproduce that. I'll
>>> try harder if the problem persists with #6688. But I can tell you that this
>>> does not necessarily indicate a severe problem / corruption, as I don't
>>> terminate strings in my trie backend code and it could just be some length
>>> calculations that went wrong. (Most probably that is because the Gambas
>>> string functions automatically use strlen() to determine a string's length
>>> when I give 0 as a length parameter. However, strlen() must not be used on
>>> the strings from my trie.)
>>>
>>> Regards,
>>> Tobi
>>>
>>
>> Salut Tobi,
>>
>> yes, now no more crash, only an error raises, thats oK.
>> The leak shown, I can't reproduce any more now..... BUT
>> Now TriePrefix is case sensitive, in my follow-up the TriePrefix "d"
>> showed me "D" AND "d"entries, now only the "d", may be thats how trie
>> work normally,
>>
>
> It should be case-sensitive. If it wasn't before on your system, that was
> a bug (I can't imagine where it came from, though).
>
>> but for my behaves non case sensitive would be better (or
>> a switch to do like that)
>> And we talked about something like <trie>.Add(Value,Key) for simplifing
>> filling.
>>
>
> OK, you get an Add() and Remove() method in #6699, similar to what
> Collection has.
>
> As for the case-insensitivity: I can add an optional constructor argument,
> Mode, which can be gb.Binary or gb.IgnoreCase (just as Collection has).
>
> (After writing about half an hour complaining how hard it would be to get
> case-insensitivity right and efficient) I just had a magnificent idea: I
> will extend the native Trie class in Gambas and do something like that:
>
>    ' Written from scratch, may contain syntax, etc. errors
>
>    Public Struct _Trie_Entry
>      Key As String
>      Value As Variant
>    End Struct
>
>    Public Sub Add(Value As Variant, Key As String)
>      Dim hEntry As New _Trie_Entry
>      Dim sKey As String = Key
>
>      If $iMode = gb.IgnoreCase Then sKey = String.Upper(Key)
>      hEntry.Key = Key
>      hEntry.Value = Value
>      Super.Add(Value, sKey)
>    End
>
> Similarly I can augment _get, _put, etc. so that you won't notice that the
> _Trie_Entry structure exists at all.
>
> If you request a case-insensitive Trie, all keys are upper-case'd internally
> and the real keys are saved as part of the stored object. So you can get
> your original key back later in an enumeration and I don't have to add
> branches in the hot paths of the trie code to support case-insensitivity
> (which would make the whole thing slower -- I don't know if it would be
> noticeable, but...). I will see if I can do that (_next() may impose a
> little problem or maybe not).
>
> @Benoit: I am not familiar with the caveats of UTF-8 strings. If I want to
> implement case-insensitivity by internally converting all characters to
> upper-case, is it sufficient to use String.Upper() or are there hidden
> pitfalls?
>
> Regards,
> Tobi
>

If they are keys, they should be ASCII, not UTF-8, like with the 
Collection class.

If case insensitive is really needed, you should really manage that 
inside the C code of the component.

Regards,

-- 
Benoît Minisini




More information about the User mailing list