[Gambas-user] I need a hint on how to deleted duplicate items in a array

Fri Jun 30 16:57:56 CEST 2017

What was wrong in my example which meant this?

Public Sub Main()

  Dim sSort As String[] = ["A", "B", "B", "B", "C", "D", "D", "E", "E",
"E", "E", "F"]
  Dim s As String

  For Each s In ReturnArrays(sSort, 0)
    Print s
  Next
  For Each s In ReturnArrays(sSort, -1)
    Print s
  Next

End

Private Function ReturnArrays(SortedArray As String[], withNumber As
Boolean) As String[]

  Dim sSingle, sWithNumber As New String[]
  Dim i, n As Integer

  For i = 0 To SortedArray.Max
    ' You can avoid with Tobias's trick (For i = 1 To ...)
    If i < SortedArray.Max Then
      If SortedArray[i] = SortedArray[i + 1] Then
        Inc n
      Else
        Inc n
        sSingle.Push(SortedArray[i])
        sWithNumber.Push(n & SortedArray[i])
        n = 0
      Endif
    Endif
  Next
  Inc n
  sSingle.Push(SortedArray[SortedArray.Max])
  sWithNumber.Push(n & SortedArray[SortedArray.Max])
  If withNumber Then
    Return sWithNumber
  Else
    Return sSingle
  Endif

End

Regards
Gianluigi

2017-06-30 15:05 GMT+02:00 Tobias Boege <taboege at ...626...>:

> On Fri, 30 Jun 2017, Fernando Cabral wrote:
> > 2017-06-30 7:44 GMT-03:00 Fabien Bodard <gambas.fr at ...626...>:
> >
> > > The best way is the nando one ... at least for gambas.
> > >
> > > As you have not to matter about what is the index value or the order,
> > > the walk ahead option is the better.
> > >
> > >
> > > Then Fernando ... for big, big things... I think you need to use a DB.
> > > Or a native language.... maybe a sqlite memory structure can be good.
> > >
> >
> > Fabien, since this is a one-time only thing, I don't think I'd be better
> > off witha database.
> > Basically, I read a text file an then break it down into words, sentences
> > and paragraphs.
> > Next I  count the items in each array (words, sentences paragraphs).
> > Array.count works wonderfully.
> > After that, have to eliminate the duplicate words (Array.words). But in
> > doing it, al also have to count
> > how many times each word appeared.
> >
> > Finally I sort the Array.Sentences and the Array.Paragraphs by size
> > (string.len()). The Array.WOrds are
> > sorted by count + lenght. This is all woring good.
> >
> > So, my quest is for the fastest way do eliminate the words duplicates
> while
> > I count them.
> > For the time being, here is a working solution based on system' s sort |
> > uniq:
> >
> > Here is one of the versions I have been using:
> >
> > Exec ["/usr/bin/uniq", "Unsorted.txt", "Sorted.srt2"] Wait
> > Exec ["/usr/bin/uniq", "-ci", "SortedWords.srt2",  SortedWords.srt3"]
> Wait
> > Exec ["/usr/bin/sort", "-bnr", SortedWords.srt3] To UniqWords
> >
>
> Are those temporary files? You can avoid those by piping your data into the
> processes and reading their output directly. Otherwise the Temp$() function
> gives you better temporary files.
>
> > WordArray = split (UniqWords, "\n")
> >
> > So, I end up with the result I want. It's effective. Now, it would be
> more
> > elegant If I could do the same
> > with Gambas. Of course, the sorting would be easy with the builting
> > WordArray.sort ().
> > But how about te '"/usr/bin/uniq", "-ci" ...' part?
> >
>
> I feel like my other mail answered this, but I can give you another version
> of that routine (which I said I would leave as an exercise to you):
>
>   ' Remove duplicates in an array like "uniq -ci". String comparison is
>   ' case insensitive. The i-th entry in the returned array counts how many
>   ' times aStrings[i] (in the de-duplicated array) was present in the
> input.
>   ' The data in ~aStrings~ is overridden. Assumes the array is sorted.
>   Private Function Uniq(aStrings As String[]) As Integer[]
>     Dim iSrc, iLast As Integer
>     Dim aCount As New Integer[](aStrings.Count)
>
>     If Not aStrings.Count Then Return []
>     iLast = 0
>     aCount[iLast] = 1
>     For iSrc = 1 To aStrings.Max
>       If String.Comp(aStrings[iSrc], aStrings[iLast], gb.IgnoreCase) Then
>         Inc iLast
>         aStrings[iLast] = aStrings[iSrc]
>         aCount[iLast] = 1
>       Else
>         Inc aCount[iLast]
>       Endif
>     Next
>
>     ' Now shrink the arrays to the memory they actually need
>     aStrings.Resize(iLast + 1)
>     aCount.Resize(iLast + 1)
>     Return aCount
>   End
>
> What, in my opinion, is at least theoretically better here than the other
> proposed solutions is that it runs in linear time, while nando's is
> quadratic[*]. (Of course, if you sort beforehand, it will become n*log(n),
> which is still better than quadratic.)
>
> Attached is a test script with some words. It runs the sort + uniq
> utilities
> first and then Array.Sort() + the Uniq() function above. The program then
> prints the *diff* between the two outputs. I get an empty diff, meaning
> that
> my Gambas routines produce exactly the same output as the shell utilities.
>
> Regards,
> Tobi
>
> [*] He calls array functions Add() and Find() inside a For loop that runs
>     over an array of size n. Adding elements to an array or searching an
>     array have themselves worst-case linear complexity, giving quadratic
>     overall. My implementation reserves some more space in advance to
>     avoid calling Add() in a loop. Since the array is sorted, we can go
>     without Find(), too. Actually, as you may know, adding an element to
>     the end of an array can be implemented in amortized constant time
>     (as C++'s std::vector does), by wasting space, but AFAICS Gambas
>     doesn't do this, but I could be wrong.
>
> --
> "There's an old saying: Don't change anything... ever!" -- Mr. Monk
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Gambas-user mailing list
> Gambas-user at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gambas-user
>
>