[Gambas-user] Another Best Approach Question: Statistics over an array of objects

Bruce bbruen at ...2308...
Wed Aug 7 15:22:23 CEST 2013


I'm looking for good ideas again I'm afraid.


I have an array of objects that can be best described as a set of
categories with an associated value.  Something along the lines of 
[Cat1:String, Cat2:String, Cat3:String, Value:Float].

The general idea is that the user would select one of the categories and
then the project would calculate a set of statistics across that
category.  For example the categories could be "Age", "Sex", "Height"
and the Value may be, say, weight.

What I am trying to do is develop a generic module/class accepting the
input array that will return another array of objects being the
statistical analysis of the input array across the specified category.
The statics are fairly basic (at this stage) being the average for each
category, the sample standard deviation and the sample standard error.

Generally the input array length is reasonably short, ~30 to ~300 items.
Also the category domains are demonstrably short, between 3 and ~10
identities.

I could (and have done) use the database (postgresql) statistics
functions and re-query the entire dataset given the user category
selection.  However, the time to execute this is unacceptably slow (the
full dataset is over 3,000,000 rows). Furthermore, I would have to
devise a generic way to build the specific query required each time.

Another way could be to develop an interface to r or something but I am
hesitant to embark on that path given my knowledge of stats libraries
like that.

So, just looking for a "good idea".

regards
Bruce





More information about the User mailing list