[Gambas-user] using a "file system database"

Doriano Blengino doriano.blengino at ...1909...
Fri Apr 15 09:56:42 CEST 2011


Kevin Fishburne ha scritto:
> I'm in the early phases of creating a "database" that uses the file 
> system for data organization rather than a traditional software database 
> such as MySQL, etc. I'm hoping that this could be faster since my 
> requirements are very specific and (I think) don't need a general 
> purpose database.
>
> I will have 4,194,304 "cells", each of which has about three datafiles 
> that will need to be opened, read from, written to and closed regularly. 
> I could consolidate the three files into one, reducing the number of 
> files, but this would increase the amount of time taken to parse the 
> files and slow the program significantly.
>
> I'm considering dividing them into hierarchies of directories to avoid 
> having four to 16 million data files in the same directory. Initial 
> tests hit file system (or file space, not sure yet) limits.
>
> Does anyone have any insights into the ext3/4 filesystem, possible 
> alternate file systems, and databases to know what the best solution 
> would be for this type of problem? Any insights into the most efficient 
> way to create hierarchies of directories and files would be appreciated 
> as well.
>
>   
Very interesting problem.
You could take a look at the Squid proxy, which has similar problems: 
how to cache files quickly.

For the file system, the more sophisticated is, the more it is expensive 
to modify it; etx3/4 or reiserfs are journaled, so they are slower and 
heavier to manage. But it could be that a well-planned journaled fs is 
faster than a bad-planned non-journaled fs... On the other hand, the 
journaling should guarantee data solidity... don't know where you want 
to set a balance between speed and reliability. Some of them (or all of 
them) don't actually guarantee "data" reliability, but only metadata.

4M files are a lot. If you want speed, you should use them in binary 
format, coalesce some of them into one, and use seek() to navigate. 
Otherwise, the computer will spend more time in searching for a file 
than actually read it.

How much big these files are? And can them be binary? Or at least some 
of them?

Regards,
Doriano







More information about the User mailing list