[Gambas-user] fastest way to "zero-out" a huge binary file

kevinfishburne kevinfishburne at ...1887...
Sun Jan 24 22:02:04 CET 2010



Doriano Blengino wrote:
> 
> Read carefully the documentation about the WRITE instruction - in the 
> first few lines you find the explanation. When you write something out, 
> you can specify the length of the data you write. If you don't specify 
> it, gambas will do it for you, and this is what you don't want.
> 

I did exactly that last night, and also found the String$ function which was
helpful in creating the string with all the zeros. I can now create a
zero-value 8 GB file in about 60 seconds. My roots are in QuickBASIC 4.5,
but my GAMBAS skills are improving the more I use it thankfully. This forum
is probably the most helpful forum in forum history as well.


Doriano Blengino wrote:
> 
> I don't know the algorithm you want to use on so many data, but if you 
> look deep at it, may be you find some shortcut to minimize read and 
> write from a file. You could read data in chunks, calculate intermediate 
> results in ram, and so on. If you try to randomly read 65537*65537 small 
> pieces of data, your program will take forever to run, because the 
> overhead of input/output will be overimposed on every two bytes. By 
> reading 4 bytes at a time instead of two, you will gain twice the speed, 
> and so on. If the calculus is heavy, then there will be a point of best 
> compromise between the heavyness of I/O and heavyness of calculus. If 
> the calculus is very-very-very heavy, then the I/O time will be
> neglectable.
> 
> Another nice thing could be to compress data to and from disk, if data 
> can be managed by blocks. You could even reduce data to 1/10, and this 
> would be a big improvement.
> 

Those are accurate observations and my main devil at this point. Here's an
explanation of the algorithm:

http://www.gameprogrammer.com/fractal.html#diamond
http://en.wikipedia.org/wiki/Diamond-square_algorithm

Basically think of the file as a square grid of byte pairs (shorts), each
representing an elevation point on a map. Each pair of bytes is read
directly into a short for processing as necessary. Initially the four corner
points are read and the algorithm is applied, generating the center point
which is then written back to the file (figure b in the first link).

The second part of the algorithm uses these five points plus four points
initially outside the file to generate four more points (figure c in the
first link). It's the same as the first part of the algorithm, but rotated
45 degrees and performed four times in different areas.

After the two algorithm variations are applied they are repeated at
half-scale across all the points, exponentially increasing the number of
iterations per pass (1x1, 4x4, 16x16, 64x64, etc.).

The only thing I can think to do to increase I/O efficiency without adding
more overhead to the number crunching (which is already significant) would
be to assign each read point to a small array of shorts so the points would
not have to be read from disk more than once per pass.

Another idea would be to wait for the subdivisions to become small enough
that one could be loaded at once into a huge array of shorts (2 GB, for
instance), then operate solely on that chunk until it was finished and
commit it back to disk. Because of the second part of the algorithm's 45
degree angle I'd still have to read some bytes that were outside the array,
however. That would also disrupt my nice nested FOR...NEXT loop I have going
to control the iterations.

I fear the bottom line is this app is going to bust my huevos and ultimately
take a long time to run. :(

For those reading this who still need to know how to zero-out a big data
file, here's my code modified for general use:

' General declarations.
DIM Zeros AS String ' Used for zeroing-out file.
DIM counter AS Long ' Generic counter.

' Open file to zero out.
somefile = OPEN "somefile" FOR INPUT OUTPUT CREATE

' Create 8 GB file full of zeros in 1 MB writes.
Zeros = String$(1048576, Chr$(0))
FOR counter = 1 TO 8192
  WRITE #somefile, Zeros, 1048576
NEXT
' Add additional length as needed in 1 byte writes.
Zeros = Chr$(0)
FOR counter = 1 TO 262146
  WRITE #somefile, Zeros, 1
NEXT


-----
Kevin Fishburne, Eight Virtues
www:  http://sales.eightvirtues.com http://sales.eightvirtues.com 
e-mail:  mailto:sales at ...1887... sales at ...1887... 
phone: (770) 853-6271
-- 
View this message in context: http://old.nabble.com/fastest-way-to-%22zero-out%22-a-huge-binary-file-tp27290885p27299072.html
Sent from the gambas-user mailing list archive at Nabble.com.





More information about the User mailing list