[Gambas-user] Some file/use or progress bar questions.

Doriano Blengino doriano.blengino at ...1909...
Wed Feb 4 10:22:10 CET 2009


Jose J. Rodriguez ha scritto:
> On 2/3/09, richard terry <rterry at ...1946...> wrote:
>   
>> On Wed, 4 Feb 2009 04:07:01 am werner 007 wrote:
>>  > Hi Richard
>>  >
>>  > Maybe you put the file in a array with split() if it makes sense for
>>  > further work.
>>  > Dont know how slow it is on such big files, but i made no bad experience
>>  > with around 10000 lines.
>>     
> IMHO, anything that forces reading the whole file twice is very
> inelegant, besides making it take twice as long. Unless you really
> need the count of lines for more than "eye candy", it is better to use
> the file size and bytes read values to update the scroll bar.
>   
I agree about the inelegancy and waste of memory. Suppose that a file on 
disk takes 100K. When read in, it takes another 100K. When splitted, it 
takes another 100K (or 105, or so). This alone is enough to say "it is 
not good if your file is some 100Mb long".

But splitting could have some added benefit, depending on what kind of 
processing have to come afterwards: after splitting, the text file can 
be managed in a random fashion. And perhaps, in certain cases, splitting 
could be faster: simply because split() is a C subroutine and does not 
have the burdain of interpreting the bytecode.

But normally I prepend for the minimum resource usage: if there is 
little data to work out, it will be fast anyway; if there is a lot of 
data to process, it will take time, but it will succeed. I say so 
because I remember when computers had 64K ram (or 640K, few years 
later). Now we have 640M and 10 (or 100?) times the speed, and computers 
are still slow... :-)

Just to count the lines of a text file, the unique (no doubt) solution 
is to read it line by line. For a 300M text file it could take a lot of 
time, but it will always succeed. If you try to read it in memory 
(twice!), you could run out of resources, and probably it is slower 
anyway. And... 'shell wc -l ...' is so short to write... :-)

Regards,
Doriano Blengino






More information about the User mailing list