On 30/06/17 08:20, Fernando Cabral wrote:
> 2017-06-30 7:44 GMT-03:00 Fabien Bodard <gambas.fr at ...626...>:
>> The best way is the nando one ... at least for gambas.
>> As you have not to matter about what is the index value or the order,
>> the walk ahead option is the better.
>> Then Fernando ... for big, big things... I think you needto use a DB.
>> Or a native language.... maybe a sqlite memory structure can be good.
> Fabien, since this is a one-time only thing, I don't think I'd be
> better off witha database.
> Basically, I read a text file an then break it down into words,
> sentences and paragraphs.
> Next I count the itemsin each array (words, sentences paragraphs).
>Array.count works wonderfully.
> After that, have to eliminate the duplicate words (Array.words). But
>in doing it, al also have to count how many times each word appeared.
> Finally I sort the Array.Sentences and the Array.Paragraphs by size
> (string.len()). The Array.WOrds are sorted by count + lenght. This is
> all woring good.
> So, my quest is for the fastest way do eliminate the words duplicates
> while I count them.
> For the time being, here is a working solution based on system' s sort
> | uniq:
> Here is one of the versions I have been using:
> Exec ["/usr/bin/uniq", "Unsorted.txt", "Sorted.srt2"] Wait
> Exec ["/usr/bin/uniq", "-ci", "SortedWords.srt2", SortedWords.srt3"] Wait
> Exec ["/usr/bin/sort", "-bnr", SortedWords.srt3] To UniqWords
> WordArray = split (UniqWords, "\n")
> So, I end up with the result I want. It's effective. Now, it would be
> more elegant If I could do the same with Gambas. Of course, the
> sorting would be easy with the builting WordArray.sort ().
> But how about te '"/usr/bin/uniq", "-ci" ...' part?
> Regards
> - fernando
Not tried, but for the duplicate count, what about iterating the word
array copying each word toa keyed collection?
For any new given word, the value (item) added would be 1 (integer), and
the key would be UCase(word$).
If an error happens, the handler would just Inc the keyed Item value. So
(please note my syntax may be slightly off, especially in If Error):
Public Function CountWordsInArray(sortedWordArray As String[]) As Collection
Dim wordCount As New Collection
Dim currentWord As String = Null
For Each currentWord In sortedWordArray
Try wordCount.Add(1, UCase$(currentWord))
If Error Then
Inc wordCount(UCase$(currentWord))
Error.Clear 'Is this needed, or even correct?
End If
Next
Return (wordCollection)
End
The returned collection should be sorted if the array was, and for each
item you will have a numeric count as the item and the word as the key.
Hope it helps,
zxMarce.