[Gambas-user] Another Best Approach Question: Statistics over an array of objects

Bruce bbruen at ...2308...
Wed Aug 7 16:26:34 CEST 2013


On Wed, 2013-08-07 at 15:58 +0200, Tobias Boege wrote:
> On Wed, 07 Aug 2013, Bruce wrote:
> > I'm looking for good ideas again I'm afraid.
> > 
> > 
> > I have an array of objects that can be best described as a set of
> > categories with an associated value.  Something along the lines of 
> > [Cat1:String, Cat2:String, Cat3:String, Value:Float].
> > 
> > The general idea is that the user would select one of the categories and
> > then the project would calculate a set of statistics across that
> > category.  For example the categories could be "Age", "Sex", "Height"
> > and the Value may be, say, weight.
> > 
> > What I am trying to do is develop a generic module/class accepting the
> > input array that will return another array of objects being the
> > statistical analysis of the input array across the specified category.
> > The statics are fairly basic (at this stage) being the average for each
> > category, the sample standard deviation and the sample standard error.
> > 
> > Generally the input array length is reasonably short, ~30 to ~300 items.
> > Also the category domains are demonstrably short, between 3 and ~10
> > identities.
> > 
> > I could (and have done) use the database (postgresql) statistics
> > functions and re-query the entire dataset given the user category
> > selection.  However, the time to execute this is unacceptably slow (the
> > full dataset is over 3,000,000 rows). Furthermore, I would have to
> > devise a generic way to build the specific query required each time.
> > 
> > Another way could be to develop an interface to r or something but I am
> > hesitant to embark on that path given my knowledge of stats libraries
> > like that.
> > 
> > So, just looking for a "good idea".
> > 
> 
> Sorry, I don't understand... You want to give a Variant[] to a class, like:
> 
> Public Sub btnGiveStats_Click()
>   Dim hStats As Stats
> 
>   hStats = Stats.Give(["Age", "Sex", "Height", fWeight])
> End
> 
> Right?
> 
> What is this array supposed to signify? On what data shall it operate? I
> mean: is there a table of persons (with fields Age, Sex, Height, ...) and
> the funtion shall count ... something? Is the "weight" used to weigh the
> average figure?
> 
> Thinking about it further, I admit that I don't understand anything... at
> all. :-)
> 
> Regards,
> Tobi

OK, rephrased.

I'm looking for good ideas to create a generic statistics module with a
function:
 Analyse(category as Integer, data_array as <someclass>[]) as Analysis[]

<someclass>[] is an array of objects, these objects consist of an
unknown number of category properties and a value property.  Analysis is
a class that exhibit some basic statistics of "value" across the specified
"category".

In short, Analysis(category,data_array) is returning a kind of a crosstab of the value against the selected category.
So we could get a user directive to anlayse "Weight" (the value) across "Sex" (the category) and the returned array would be
[{"2years",12.3432, 1.123, 0.34}, {"3years", 14.1643,1.112,0.01},{"4years",16.954,2.001,0.13}, etc]
where the {} contents are the properties of the Analysis class, viz Category,Average,StdDev,StdErr.

The question is whether it would be to write statistical analysis routines from
scratch or is there a better (or easier) way using either 
a) "known" libraries, or
b) developing a set of generic methods to use the underlying database stats functions
c) a published gambas component?

regards
Bruce







More information about the User mailing list