[Gambas-user] I need a hint on how to deleted duplicate items in a array
Gianluigi
bagonergi at ...626...
Fri Jun 30 16:57:56 CEST 2017
What was wrong in my example which meant this?
Public Sub Main()
Dim sSort As String[] = ["A", "B", "B", "B", "C", "D", "D", "E", "E",
"E", "E", "F"]
Dim s As String
For Each s In ReturnArrays(sSort, 0)
Print s
Next
For Each s In ReturnArrays(sSort, -1)
Print s
Next
End
Private Function ReturnArrays(SortedArray As String[], withNumber As
Boolean) As String[]
Dim sSingle, sWithNumber As New String[]
Dim i, n As Integer
For i = 0 To SortedArray.Max
' You can avoid with Tobias's trick (For i = 1 To ...)
If i < SortedArray.Max Then
If SortedArray[i] = SortedArray[i + 1] Then
Inc n
Else
Inc n
sSingle.Push(SortedArray[i])
sWithNumber.Push(n & SortedArray[i])
n = 0
Endif
Endif
Next
Inc n
sSingle.Push(SortedArray[SortedArray.Max])
sWithNumber.Push(n & SortedArray[SortedArray.Max])
If withNumber Then
Return sWithNumber
Else
Return sSingle
Endif
End
Regards
Gianluigi
2017-06-30 15:05 GMT+02:00 Tobias Boege <taboege at ...626...>:
> On Fri, 30 Jun 2017, Fernando Cabral wrote:
> > 2017-06-30 7:44 GMT-03:00 Fabien Bodard <gambas.fr at ...626...>:
> >
> > > The best way is the nando one ... at least for gambas.
> > >
> > > As you have not to matter about what is the index value or the order,
> > > the walk ahead option is the better.
> > >
> > >
> > > Then Fernando ... for big, big things... I think you need to use a DB.
> > > Or a native language.... maybe a sqlite memory structure can be good.
> > >
> >
> > Fabien, since this is a one-time only thing, I don't think I'd be better
> > off witha database.
> > Basically, I read a text file an then break it down into words, sentences
> > and paragraphs.
> > Next I count the items in each array (words, sentences paragraphs).
> > Array.count works wonderfully.
> > After that, have to eliminate the duplicate words (Array.words). But in
> > doing it, al also have to count
> > how many times each word appeared.
> >
> > Finally I sort the Array.Sentences and the Array.Paragraphs by size
> > (string.len()). The Array.WOrds are
> > sorted by count + lenght. This is all woring good.
> >
> > So, my quest is for the fastest way do eliminate the words duplicates
> while
> > I count them.
> > For the time being, here is a working solution based on system' s sort |
> > uniq:
> >
> > Here is one of the versions I have been using:
> >
> > Exec ["/usr/bin/uniq", "Unsorted.txt", "Sorted.srt2"] Wait
> > Exec ["/usr/bin/uniq", "-ci", "SortedWords.srt2", SortedWords.srt3"]
> Wait
> > Exec ["/usr/bin/sort", "-bnr", SortedWords.srt3] To UniqWords
> >
>
> Are those temporary files? You can avoid those by piping your data into the
> processes and reading their output directly. Otherwise the Temp$() function
> gives you better temporary files.
>
> > WordArray = split (UniqWords, "\n")
> >
> > So, I end up with the result I want. It's effective. Now, it would be
> more
> > elegant If I could do the same
> > with Gambas. Of course, the sorting would be easy with the builting
> > WordArray.sort ().
> > But how about te '"/usr/bin/uniq", "-ci" ...' part?
> >
>
> I feel like my other mail answered this, but I can give you another version
> of that routine (which I said I would leave as an exercise to you):
>
> ' Remove duplicates in an array like "uniq -ci". String comparison is
> ' case insensitive. The i-th entry in the returned array counts how many
> ' times aStrings[i] (in the de-duplicated array) was present in the
> input.
> ' The data in ~aStrings~ is overridden. Assumes the array is sorted.
> Private Function Uniq(aStrings As String[]) As Integer[]
> Dim iSrc, iLast As Integer
> Dim aCount As New Integer[](aStrings.Count)
>
> If Not aStrings.Count Then Return []
> iLast = 0
> aCount[iLast] = 1
> For iSrc = 1 To aStrings.Max
> If String.Comp(aStrings[iSrc], aStrings[iLast], gb.IgnoreCase) Then
> Inc iLast
> aStrings[iLast] = aStrings[iSrc]
> aCount[iLast] = 1
> Else
> Inc aCount[iLast]
> Endif
> Next
>
> ' Now shrink the arrays to the memory they actually need
> aStrings.Resize(iLast + 1)
> aCount.Resize(iLast + 1)
> Return aCount
> End
>
> What, in my opinion, is at least theoretically better here than the other
> proposed solutions is that it runs in linear time, while nando's is
> quadratic[*]. (Of course, if you sort beforehand, it will become n*log(n),
> which is still better than quadratic.)
>
> Attached is a test script with some words. It runs the sort + uniq
> utilities
> first and then Array.Sort() + the Uniq() function above. The program then
> prints the *diff* between the two outputs. I get an empty diff, meaning
> that
> my Gambas routines produce exactly the same output as the shell utilities.
>
> Regards,
> Tobi
>
> [*] He calls array functions Add() and Find() inside a For loop that runs
> over an array of size n. Adding elements to an array or searching an
> array have themselves worst-case linear complexity, giving quadratic
> overall. My implementation reserves some more space in advance to
> avoid calling Add() in a loop. Since the array is sorted, we can go
> without Find(), too. Actually, as you may know, adding an element to
> the end of an array can be implemented in amortized constant time
> (as C++'s std::vector does), by wasting space, but AFAICS Gambas
> doesn't do this, but I could be wrong.
>
> --
> "There's an old saying: Don't change anything... ever!" -- Mr. Monk
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Gambas-user mailing list
> Gambas-user at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gambas-user
>
>
More information about the User
mailing list