[Gambas-user] I need a hint on how to deleted duplicate items in a array

Fernando Cabral fernandojosecabral at ...626...
Sat Jul 1 05:18:42 CEST 2017


I thank you guys for the hints on counting and eliminating duplicates. In
the end, I resorted to something that is very simple and does the trick in
three steps. In the first step I sort the array.
In the second step I count the number of occurrences and prepend it to the
word itself (with a separator). In the third step I sort the array again,
so now I have it sorted by the number of occurrences from the largest to
the smallest.

That is all I need.

Nevertheless, I am concerned with the performance. For 69,725 words, from
which 8,987 were unique, it took 28 seconds for the code below to execute.
I will survive this 28 seconds, if I have to. But I still would like to
find a faster solution.

On the other hand, I think I am close to the fastest possible solution.
Basically, the array will be traversed once only, no matter how many terms
and how many repetitions it may have.

(What do you think about this efficiency, Tobi?)




















*MatchedWords.Sort(gb.ascent + gb.language + gb.IgnoreCase) For i = 0 To
MatchedWords.Max    n = 1    For j = i + 1 To MatchedWords.Max      If
(Comp(MatchedWords[i], MatchedWords[j], gb.language + gb.ignorecase) = 0)
Then         n += 1      Else         Break      Endif    Next
UniqWords.Push(Format(n, "0###") & "#" & MatchedWords[i])    i += (n - 1)
NextUniqWords.Sort(gb.descent + gb.language + gb.ignorecase)For i = 0 To
UniqWords.Max   Print UniqWords[i]Next*



2017-06-30 15:10 GMT-03:00 Gianluigi <bagonergi at ...626...>:

> Just for curiosity, on my computer, my function (double) processes 10
> million strings (first and last name) in about 3 seconds.
> Very naif measurement using Timers and a limited number of names and
> surnames eg Willy Weber has come up 11051 times
>
> To demonstrate the goodness of Tobias' arguments, about 1 million 3 cents a
> second I really understood (I hope) what he wanted to say.
>
> Sorry my response times but today my modem works worse than my brain.
>
> Regards
> Gianluigi
>
> 2017-06-30 17:58 GMT+02:00 Gianluigi <bagonergi at ...626...>:
>
> > Sorry Tobias,
> > other explanations are not necessary.
> > I would not be able to understand :-(
> > I accept what you already explained to me as a dogma and I will try to
> put
> > it into practice by copying your code :-).
> >
> > Thanks again.
> >
> > Gianluigi
> >
> > 2017-06-30 17:44 GMT+02:00 Gianluigi <bagonergi at ...626...>:
> >
> >>
> >> 2017-06-30 17:21 GMT+02:00 Tobias Boege <taboege at ...626...>:
> >>
> >>>
> >>> I wouldn't say there is anything *wrong* with it, but it also has
> >>> quadratic
> >>> worst-case running time. You use String[].Push() which is just another
> >>> name
> >>> for String[].Add(). Adding an element to an array (the straightforward
> >>> way)
> >>> is done by extending the space of that array by one further element and
> >>> storing the value there. But extending the space of an array could
> >>> potentially
> >>> require you to copy the whole array somewhere else (where you have
> enough
> >>> free memory at the end of the array to enlarge it). Doing worst-case
> >>> analysis,
> >>> we have to assume that this bad case always occurs.
> >>>
> >>> If you fill an array with n values, e.g.
> >>>
> >>>   Dim a As New Integer[]
> >>>   For i = 1 To n
> >>>     a.Add(i)
> >>>   Next
> >>>
> >>> then you loop n times and in the i-th iteration there will be already
> >>> i-many elements in your array. Adding one further element to it will,
> >>> in the worst case, require i copy operations to be performed.
> 9-year-old
> >>> C.F. Gauss will tell you that the amount of store operations is about
> >>> n^2.
> >>>
> >>>
> >> Tobias you are always kind and thank you very much.
> >> Is possible for you to explain this more elementarily, for me (a poorly
> >> educated boy :-) )
> >>
> >>
> >>
> >>> And your function does two jobs simultaneously but only returns the
> >>> result
> >>> of one of the jobs. The output you get is only worth half the time you
> >>> spent.
> >>>
> >>>
> >> I did two functions in one, just to save space, this is a simple
> example.
> >> :-)
> >>
> >> Regards
> >> Gianluigi
> >>
> >
> >
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Gambas-user mailing list
> Gambas-user at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gambas-user
>



-- 
Fernando Cabral
Blogue: http://fernandocabral.org
Twitter: http://twitter.com/fjcabral
e-mail: fernandojosecabral at ...626...
Facebook: f at ...3654...
Telegram: +55 (37) 99988-8868
Wickr ID: fernandocabral
WhatsApp: +55 (37) 99988-8868
Skype:  fernandojosecabral
Telefone fixo: +55 (37) 3521-2183
Telefone celular: +55 (37) 99988-8868

Enquanto houver no mundo uma só pessoa sem casa ou sem alimentos,
nenhum político ou cientista poderá se gabar de nada.



More information about the User mailing list