[Gambas-user] Another Best Approach Question: Statistics over an array of objects
Tobias Boege
taboege at ...626...
Thu Aug 8 12:51:28 CEST 2013
On Thu, 08 Aug 2013, Bruce wrote:
> On Wed, 2013-08-07 at 23:56 +0930, Bruce wrote:
> > On Wed, 2013-08-07 at 15:58 +0200, Tobias Boege wrote:
> > > On Wed, 07 Aug 2013, Bruce wrote:
> > > > I'm looking for good ideas again I'm afraid.
> > > >
> > > >
> > > > I have an array of objects that can be best described as a set of
> > > > categories with an associated value. Something along the lines of
> > > > [Cat1:String, Cat2:String, Cat3:String, Value:Float].
> > > >
> > > > The general idea is that the user would select one of the categories and
> > > > then the project would calculate a set of statistics across that
> > > > category. For example the categories could be "Age", "Sex", "Height"
> > > > and the Value may be, say, weight.
> > > >
> > > > What I am trying to do is develop a generic module/class accepting the
> > > > input array that will return another array of objects being the
> > > > statistical analysis of the input array across the specified category.
> > > > The statics are fairly basic (at this stage) being the average for each
> > > > category, the sample standard deviation and the sample standard error.
> > > >
> > > > Generally the input array length is reasonably short, ~30 to ~300 items.
> > > > Also the category domains are demonstrably short, between 3 and ~10
> > > > identities.
> > > >
> > > > I could (and have done) use the database (postgresql) statistics
> > > > functions and re-query the entire dataset given the user category
> > > > selection. However, the time to execute this is unacceptably slow (the
> > > > full dataset is over 3,000,000 rows). Furthermore, I would have to
> > > > devise a generic way to build the specific query required each time.
> > > >
> > > > Another way could be to develop an interface to r or something but I am
> > > > hesitant to embark on that path given my knowledge of stats libraries
> > > > like that.
> > > >
> > > > So, just looking for a "good idea".
> > > >
> > >
> > > Sorry, I don't understand... You want to give a Variant[] to a class, like:
> > >
> > > Public Sub btnGiveStats_Click()
> > > Dim hStats As Stats
> > >
> > > hStats = Stats.Give(["Age", "Sex", "Height", fWeight])
> > > End
> > >
> > > Right?
> > >
> > > What is this array supposed to signify? On what data shall it operate? I
> > > mean: is there a table of persons (with fields Age, Sex, Height, ...) and
> > > the funtion shall count ... something? Is the "weight" used to weigh the
> > > average figure?
> > >
> > > Thinking about it further, I admit that I don't understand anything... at
> > > all. :-)
> > >
> > > Regards,
> > > Tobi
> >
> > OK, rephrased.
> >
> > I'm looking for good ideas to create a generic statistics module with a
> > function:
> > Analyse(category as Integer, data_array as <someclass>[]) as Analysis[]
> >
> > <someclass>[] is an array of objects, these objects consist of an
> > unknown number of category properties and a value property. Analysis is
> > a class that exhibit some basic statistics of "value" across the specified
> > "category".
> >
> > In short, Analysis(category,data_array) is returning a kind of a crosstab of the value against the selected category.
> > So we could get a user directive to anlayse "Weight" (the value) across "Sex" (the category) and the returned array would be
> > [{"2years",12.3432, 1.123, 0.34}, {"3years", 14.1643,1.112,0.01},{"4years",16.954,2.001,0.13}, etc]
> > where the {} contents are the properties of the Analysis class, viz Category,Average,StdDev,StdErr.
> >
> > The question is whether it would be to write statistical analysis routines from
> > scratch or is there a better (or easier) way using either
> > a) "known" libraries, or
I don't know any... I'd write this as a Gambas module out of pure naivity.
> > b) developing a set of generic methods to use the underlying database stats functions
> > c) a published gambas component?
> >
> > regards
> > Bruce
> >
> Oops, I meant
> > In short, Analysis(category,data_array) is returning a kind of a
> > crosstab of the value against the selected category.
> > So we could get a user directive to anlayse "Weight" (the value)
> > across "AGE" (the category) and the returned array would be
> > [{"2years",12.3432, 1.123, 0.34}, {"3years",
> > 14.1643,1.112,0.01},{"4years",16.954,2.001,0.13}, etc]
> > where the {} contents are the properties of the Analysis class, viz
> > Category,Average,StdDev,StdErr.
> B
In my ears, this sounds more like an introspection problem than a
mathematical one, right?
For a given category (age), there are multiple properties in each object (I
assume that the property name is the same across all objects, though) which
contain a value.
The algorithm would be, IIUC:
1. Ask one of the objects to give you the names of all properties belonging
to the category.
2. Enumerate all of these property name strings (sProp)
2.1. For Each object In data_array, get the value of the current property
2.2. Do the math
As a function it would look like:
Public Struct Analysis
Name As String
Average As Float
StdDev As Float
StdErr As Float
End Struct
Public Sub Analysis(iCat As Integer, aObjs As Object[]) As Analysis[]
Dim aResult As New Analysis[]
Dim hAnalysis As Analysis
Dim sProp As String
Dim hObject As Object
Dim iValue As Integer
' Get the names of all properties in the objects which are associated with
' the given category (as a String[])
For Each sProp In aObjs[0].AssociatedProperties(iCat)
hAnalysis = New Analysis
hAnalysis.Name = sProp
For Each hObject In aObjs
iValue = Object.GetProperty(hObject, sProp)
' Do the math
Next
aResult.Add(hAnalysis)
Next
Return hAnalysis
End
The difficult and most maintainance-burdened part is the
AssociatedProperties() function: each class which you want to analyse has to
implement it. I think of it like:
' This is SomeClass.class
Property 2years As Integer
Property 3years As Integer
Property 4years As Integer
Public Function AssociatedProperties(iCat As Integer) As String[]
Select Case iCat
Case CategoryAge
Return ["2years", "3years", "4years"]
Case CategorySex
Return ...
Case Category...
Return ...
End Select
End
Of course, it would be much easier if you followed a specific pattern, i.e.
if you named all "age" properties like "Xyears" you can iterate over all
symbols in a class and dynamically find the property names:
Property 2years As Integer
Property 3years As Integer
Property 4years As Integer
Public Function AssociatedProperties(iCat As Integer) As String[]
Dim aProps As New String[]
Dim sSym As String
For Each sSym In Object.Class(Me).Symbols
If Object.Class(Me)[sSym].Kind = Class.Property Then
Select Case iCat
Case CategoryAge
If sSym Ends "years" Then aProps.Add(sSym)
Case CategorySex
...
Case ...
...
End Select
Endif
Next
Return aProps
End
I hope this mail isn't overkill and helps you a bit further :-)
Regards,
Tobi
More information about the User
mailing list