[Gambas-user] Another Best Approach Question: Statistics over an array of objects

Bruce bbruen at ...2308...
Wed Aug 7 16:34:43 CEST 2013


On Wed, 2013-08-07 at 23:56 +0930, Bruce wrote:
> On Wed, 2013-08-07 at 15:58 +0200, Tobias Boege wrote:
> > On Wed, 07 Aug 2013, Bruce wrote:
> > > I'm looking for good ideas again I'm afraid.
> > > 
> > > 
> > > I have an array of objects that can be best described as a set of
> > > categories with an associated value.  Something along the lines of 
> > > [Cat1:String, Cat2:String, Cat3:String, Value:Float].
> > > 
> > > The general idea is that the user would select one of the categories and
> > > then the project would calculate a set of statistics across that
> > > category.  For example the categories could be "Age", "Sex", "Height"
> > > and the Value may be, say, weight.
> > > 
> > > What I am trying to do is develop a generic module/class accepting the
> > > input array that will return another array of objects being the
> > > statistical analysis of the input array across the specified category.
> > > The statics are fairly basic (at this stage) being the average for each
> > > category, the sample standard deviation and the sample standard error.
> > > 
> > > Generally the input array length is reasonably short, ~30 to ~300 items.
> > > Also the category domains are demonstrably short, between 3 and ~10
> > > identities.
> > > 
> > > I could (and have done) use the database (postgresql) statistics
> > > functions and re-query the entire dataset given the user category
> > > selection.  However, the time to execute this is unacceptably slow (the
> > > full dataset is over 3,000,000 rows). Furthermore, I would have to
> > > devise a generic way to build the specific query required each time.
> > > 
> > > Another way could be to develop an interface to r or something but I am
> > > hesitant to embark on that path given my knowledge of stats libraries
> > > like that.
> > > 
> > > So, just looking for a "good idea".
> > > 
> > 
> > Sorry, I don't understand... You want to give a Variant[] to a class, like:
> > 
> > Public Sub btnGiveStats_Click()
> >   Dim hStats As Stats
> > 
> >   hStats = Stats.Give(["Age", "Sex", "Height", fWeight])
> > End
> > 
> > Right?
> > 
> > What is this array supposed to signify? On what data shall it operate? I
> > mean: is there a table of persons (with fields Age, Sex, Height, ...) and
> > the funtion shall count ... something? Is the "weight" used to weigh the
> > average figure?
> > 
> > Thinking about it further, I admit that I don't understand anything... at
> > all. :-)
> > 
> > Regards,
> > Tobi
> 
> OK, rephrased.
> 
> I'm looking for good ideas to create a generic statistics module with a
> function:
>  Analyse(category as Integer, data_array as <someclass>[]) as Analysis[]
> 
> <someclass>[] is an array of objects, these objects consist of an
> unknown number of category properties and a value property.  Analysis is
> a class that exhibit some basic statistics of "value" across the specified
> "category".
> 
> In short, Analysis(category,data_array) is returning a kind of a crosstab of the value against the selected category.
> So we could get a user directive to anlayse "Weight" (the value) across "Sex" (the category) and the returned array would be
> [{"2years",12.3432, 1.123, 0.34}, {"3years", 14.1643,1.112,0.01},{"4years",16.954,2.001,0.13}, etc]
> where the {} contents are the properties of the Analysis class, viz Category,Average,StdDev,StdErr.
> 
> The question is whether it would be to write statistical analysis routines from
> scratch or is there a better (or easier) way using either 
> a) "known" libraries, or
> b) developing a set of generic methods to use the underlying database stats functions
> c) a published gambas component?
> 
> regards
> Bruce
> 
Oops, I meant
> In short, Analysis(category,data_array) is returning a kind of a 
> crosstab of the value against the selected category.
> So we could get a user directive to anlayse "Weight" (the value)
> across "AGE" (the category) and the returned array would be
> [{"2years",12.3432, 1.123, 0.34}, {"3years",
> 14.1643,1.112,0.01},{"4years",16.954,2.001,0.13}, etc]
> where the {} contents are the properties of the Analysis class, viz
> Category,Average,StdDev,StdErr.
B






More information about the User mailing list