[Gambas-user] I need a hint on how to deleted duplicate items in a array

ML d4t4full at ...626...
Fri Jun 30 14:32:50 CEST 2017


On 30/06/17 08:20, Fernando Cabral wrote:
> 2017-06-30 7:44 GMT-03:00 Fabien Bodard <gambas.fr at ...626...>:
>> The best way is the nando one ... at least for gambas.
>> As you have not to matter about what is the index value or the order,
>> the walk ahead option is the better.
>> Then Fernando ... for big, big things... I think you need to use a DB.
>> Or a native language.... maybe a sqlite memory structure can be good.
> Fabien, since this is a one-time only thing, I don't think I'd be
> better off witha database.
> Basically, I read a text file an then break it down into words,
> sentences and paragraphs.
> Next I  count the items in each array (words, sentences paragraphs).
> Array.count works wonderfully.
> After that, have to eliminate the duplicate words (Array.words). But
> in doing it, al also have to count how many times each word appeared.
> Finally I sort the Array.Sentences and the Array.Paragraphs by size
> (string.len()). The Array.WOrds are sorted by count + lenght. This is
> all woring good.
> So, my quest is for the fastest way do eliminate the words duplicates
> while I count them.
> For the time being, here is a working solution based on system' s sort
> | uniq:
> Here is one of the versions I have been using:
> Exec ["/usr/bin/uniq", "Unsorted.txt", "Sorted.srt2"] Wait
> Exec ["/usr/bin/uniq", "-ci", "SortedWords.srt2",  SortedWords.srt3"] Wait
> Exec ["/usr/bin/sort", "-bnr", SortedWords.srt3] To UniqWords
> WordArray = split (UniqWords, "\n")
> So, I end up with the result I want. It's effective. Now, it would be
> more elegant If I could do the same with Gambas. Of course, the
> sorting would be easy with the builting WordArray.sort ().
> But how about te '"/usr/bin/uniq", "-ci" ...' part?
> Regards
> - fernando

Not tried, but for the duplicate count, what about iterating the word
array copying each word to a keyed collection?
For any new given word, the value (item) added would be 1 (integer), and
the key would be UCase(word$).
If an error happens, the handler would just Inc the keyed Item value. So
(please note my syntax may be slightly off, especially in If Error):

Public Function CountWordsInArray(sortedWordArray As String[]) As Collection

  Dim wordCount As New Collection
  Dim currentWord As String = Null

  For Each currentWord In sortedWordArray

    Try wordCount.Add(1, UCase$(currentWord))
    If Error Then
      Inc wordCount(UCase$(currentWord))
      Error.Clear 'Is this needed, or even correct?
    End If

  Next

  Return (wordCollection)

End

The returned collection should be sorted if the array was, and for each
item you will have a numeric count as the item and the word as the key.
Hope it helps,
zxMarce.





More information about the User mailing list