[Gambas-user] fastest way to "zero-out" a huge binary file

Mon Jan 25 10:25:03 CET 2010

kevinfishburne ha scritto:
> Doriano Blengino wrote:
>   
>> Read carefully the documentation about the WRITE instruction - in the 
>> first few lines you find the explanation. When you write something out, 
>> you can specify the length of the data you write. If you don't specify 
>> it, gambas will do it for you, and this is what you don't want.
>>
>>     
>
> I did exactly that last night, and also found the String$ function which was
> helpful in creating the string with all the zeros. I can now create a
> zero-value 8 GB file in about 60 seconds. My roots are in QuickBASIC 4.5,
> but my GAMBAS skills are improving the more I use it thankfully. This forum
> is probably the most helpful forum in forum history as well.
>
>
> Doriano Blengino wrote:
>   
>> I don't know the algorithm you want to use on so many data, but if you 
>> look deep at it, may be you find some shortcut to minimize read and 
>> write from a file. You could read data in chunks, calculate intermediate 
>> results in ram, and so on. If you try to randomly read 65537*65537 small 
>> pieces of data, your program will take forever to run, because the 
>> overhead of input/output will be overimposed on every two bytes. By 
>> reading 4 bytes at a time instead of two, you will gain twice the speed, 
>> and so on. If the calculus is heavy, then there will be a point of best 
>> compromise between the heavyness of I/O and heavyness of calculus. If 
>> the calculus is very-very-very heavy, then the I/O time will be
>> neglectable.
>>
>> Another nice thing could be to compress data to and from disk, if data 
>> can be managed by blocks. You could even reduce data to 1/10, and this 
>> would be a big improvement.
>>
>>     
>
> Those are accurate observations and my main devil at this point. Here's an
> explanation of the algorithm:
>
> http://www.gameprogrammer.com/fractal.html#diamond
> http://en.wikipedia.org/wiki/Diamond-square_algorithm
>
> Basically think of the file as a square grid of byte pairs (shorts), each
> representing an elevation point on a map. Each pair of bytes is read
> directly into a short for processing as necessary. Initially the four corner
> points are read and the algorithm is applied, generating the center point
> which is then written back to the file (figure b in the first link).
>
> The second part of the algorithm uses these five points plus four points
> initially outside the file to generate four more points (figure c in the
> first link). It's the same as the first part of the algorithm, but rotated
> 45 degrees and performed four times in different areas.
>
> After the two algorithm variations are applied they are repeated at
> half-scale across all the points, exponentially increasing the number of
> iterations per pass (1x1, 4x4, 16x16, 64x64, etc.).
>
> The only thing I can think to do to increase I/O efficiency without adding
> more overhead to the number crunching (which is already significant) would
> be to assign each read point to a small array of shorts so the points would
> not have to be read from disk more than once per pass.
>
> Another idea would be to wait for the subdivisions to become small enough
> that one could be loaded at once into a huge array of shorts (2 GB, for
> instance), then operate solely on that chunk until it was finished and
> commit it back to disk. Because of the second part of the algorithm's 45
> degree angle I'd still have to read some bytes that were outside the array,
> however. That would also disrupt my nice nested FOR...NEXT loop I have going
> to control the iterations.
>   
Surely this is what I thought at first. The algorithm starts by 
examining points far apart each other, but then, dicotomically, goes to 
smaller and smaller squares. At a certain point, the squares get so 
small to be easily kept in ram; say, a 1024*1024 square. At this point, 
if I well understand, there are no "pollutions" from the surrounding 
points, so you can use an array and call recursively a routine on it, 
giving the four vertexes.
The subroutine could look something like this:

    sub square_diamond(left, right, top, bottom as integer)

      step = (right+left) / 2
       if right-left <= MAX_ARRAY_SIZE then
          ' use internal cached data

          ' but first load data from disk if it is the first time
          if right-left = MAX_ARRAY_SIZE then load_array_from_disk(...)

          cache[(left+step, top+step] = (blah+blah+blah+...) / 4
       else
         ' seek in disk, load 4/8 values, calculate, and write back to disk
       endif

       ' apply this algorithm in the four resulting squares
       square_diamond(left, left+step, top, top+step)      ' upper left
       square_diamond(left+step, right, top, top+step)      ' upper right
       square_diamond(left, left+step, top+step, bottom)      ' lower left
       square_diamond(left+step, right, top+step,bottom)      ' lower right
       if step=1 then
          ' finished with this square. If this is the lower-rightest 
square of the cache, write it back to the disk
          ' sorry... too lazy to continue this...
    end

In the above code there are lots of bugs and omissions, but this is the 
whole idea.

Hope this helps - cannot think, at the moment, at anything better. 
Sometimes one can try to imagine at data in a way totally different; for 
example, are we sure we must see the file as 65537 rows, each of 65537 
columns? Could be that seeing it as a series of bigger and bigger frames 
(1 dot, the central one, then the 8 surrounding dots starting from top, 
then the next frame and so on) could help? Probably not, but this is 
just a quick temptative to see a different arrangement.

Regards,
Doriano