[Gambas-user] using a "file system database"

Doriano Blengino doriano.blengino at ...1909...
Wed Apr 20 09:08:25 CEST 2011


Kevin Fishburne ha scritto:
>>> My current plan is to create a directory for each region
>>> ([65536/32/32]^2). Each region directory contains 32^2 data files
>>> (1024). Hopefully this won't stress any particular file system as far as
>>> how many directories and files are contained within a single directory.
>>>
>>>       
>> But... I am missing something... the number of files was 4M, right? And
>> 64 directories with 1024 files does not sum up to 4M...
>>     
>
> A cell is 32x32 tiles (bytes), and the map is 65536^2 tiles, so there 
> are 2048^2 cells. Each of these are organized into directories of 32^2 
> cells (regions). The data files alone are 4.2 million in number, the 
> directories that organize them hopefully add a layer of filesystem 
> efficiency. Right now a separate directory is created for each region 
> and empty data files are created for each cell within that region.
>
>   
I can't explain myself why I did not understand at first. Well (not)... 
apparently I missed the square "^2".

So there are 4096 directories containing 1024 files each, and each file 
is 1Kb in size (or not?). If this is correct, you should choose a small 
allocation unit for your file system (1 or 2 Kb?). Assuming that the 
file system uses, say, 128 bytes for storing the metadata for a file, 
you will have 4Gb of data, 512Mb of metadata for the files, and 
4K*128=512K of metadata for the directories. More than 12% of the total 
volume of data is spent for organizing them, which is not a bad number, 
but in your case this could be improved by merging together, for 
example, all the files (1024) of a single directory, if this is 
possible. If the files were different in length, then using the file 
system to organize them would have been the easiest way. But this is 
just a theoretical supposition...

I have checked the file systems on my old server. The root partition is 
2Gb (your cells would not fit there), uses 1K blocks, and has a total of 
1M inodes (again the cells would not fit). The /usr partition is 18Gb, 
uses a block size of 4K, and has a total of 1M inodes. Again, your data 
would not fit because of the number of files. The biggest partition is 
200Gb, uses 4K as block size, and counts 12M inodes. In this partition, 
your data would take 16Gb of space, but anyway it would fit, at the cost 
of transferring 4 times the useful data at every read (probably). I 
think the only pit is speed, and not disk space wasting. If there is 
enough ram in the computer to cache the 512Mb of metadata, access is 
very quick (10 milliseconds? Or less?). Otherwise, the same metadata 
must be read over and over, at every access, which means that the 
computer will read several Mb to access a single 1k file. These are the 
opposite and extreme situations - in the reality a certain amount of 
disk cache will always be in use. My old server has only 96Mb of ram, 
but as a normal file server performs not badly. Clearly, I don't have 4M 
files on it... :-)

Regards,
Doriano





More information about the User mailing list