[Gambas-user] using a "file system database"

Mon Apr 18 07:24:58 CEST 2011

On 04/15/2011 03:56 AM, Doriano Blengino wrote:
> Kevin Fishburne ha scritto:
>> I'm in the early phases of creating a "database" that uses the file
>> system for data organization rather than a traditional software database
>> such as MySQL, etc. I'm hoping that this could be faster since my
>> requirements are very specific and (I think) don't need a general
>> purpose database.
>>
>> I will have 4,194,304 "cells", each of which has about three datafiles
>> that will need to be opened, read from, written to and closed regularly.
>> I could consolidate the three files into one, reducing the number of
>> files, but this would increase the amount of time taken to parse the
>> files and slow the program significantly.
>>
>> I'm considering dividing them into hierarchies of directories to avoid
>> having four to 16 million data files in the same directory. Initial
>> tests hit file system (or file space, not sure yet) limits.
>>
>> Does anyone have any insights into the ext3/4 filesystem, possible
>> alternate file systems, and databases to know what the best solution
>> would be for this type of problem? Any insights into the most efficient
>> way to create hierarchies of directories and files would be appreciated
>> as well.
>>
> Very interesting problem.
> You could take a look at the Squid proxy, which has similar problems:
> how to cache files quickly.
>
> For the file system, the more sophisticated is, the more it is expensive
> to modify it; etx3/4 or reiserfs are journaled, so they are slower and
> heavier to manage. But it could be that a well-planned journaled fs is
> faster than a bad-planned non-journaled fs... On the other hand, the
> journaling should guarantee data solidity... don't know where you want
> to set a balance between speed and reliability. Some of them (or all of
> them) don't actually guarantee "data" reliability, but only metadata.
>
> 4M files are a lot. If you want speed, you should use them in binary
> format, coalesce some of them into one, and use seek() to navigate.
> Otherwise, the computer will spend more time in searching for a file
> than actually read it.
>
> How much big these files are? And can them be binary? Or at least some
> of them?

Hi Doriano. Good advice about functionality versus speed. I'm going to 
be testing both ext2 and xfs this week to see which is superior for my 
purposes. As far as data integrity, I'll probably have some sort of 
local RAID as a backup target with slow, incremental writes to it. If 
the server dies, then at least most of the game data will be preserved 
without harming the performance of the server app. So the weakness of 
the filesystem with regard to crash recovery is irrelevant.

Yes, the files are binary. I'll be reading their values directly into 
their corresponding datatypes, and writing values back the same way. 
There will be very few strings written. Each "field" of the data files 
will be a predetermined length so I can just jump ahead without even 
using Seek. Right now the field length is 64 bytes, which leaves me 
plenty of reserved space (32 bytes) for additional info.

My current plan is to create a directory for each region 
([65536/32/32]^2). Each region directory contains 32^2 data files 
(1024). Hopefully this won't stress any particular file system as far as 
how many directories and files are contained within a single directory.

-- 
Kevin Fishburne
Eight Virtues
www: http://sales.eightvirtues.com
e-mail: sales at ...1887...
phone: (770) 853-6271