[Gambas-user] any way to convert Result to Collection more faster than copy?

adamnt42 at ...626... adamnt42 at ...626...
Sat Jul 1 09:36:21 CEST 2017


On Fri, 30 Jun 2017 08:41:49 -0400
PICCORO McKAY Lenz <mckaygerhard at ...626...> wrote:

> i get more than 30 minutes, due i must parse to a low end machine, not to
> your 4 cores, 16Gb ram super power machine.. i'm taking about a 1G ram and
> single core 1,6GHz  atom cpu
> 
> i need to convert from Result/cursor to other due the problem of the odbc
> lack of cursor/count ..
> 
> i thinking about use a sqlite memory structure, how can i force it?
> documentation said "If Name is null, then a memory database is opened." for
> sqlite..
> 
> so if i used a memory structure can be a good idea? *tested yesterday took
> about 10 minutes but i dont know if i have a problem in my gambas
> installation!*
> 
> 
> 
> Lenz McKAY Gerardo (PICCORO)
> http://qgqlochekone.blogspot.com
> 
> 2017-06-30 4:09 GMT-04:00 adamnt42 at ...626... <adamnt42 at ...626...>:
(SNIP)
> > Here's the timing output.
> >
> > 17:05:59:706    Connecting to DB
> > 17:06:00:202    Loading Data    <---- so 406 mSec to establish the db
> > connection
> > 17:06:31:417    556502 rows     <---- so 31,215 mSec to execute the query
> > and return the result
> > 17:06:31:417    Unmarshalling result started
> > 17:06:44:758    Unmarshalling completed 556502 rows processed  <---  so 13,341 mSec to unmarshall the result into an array of structs
> >
> > So, it took roughly 31 seconds to execute the query and return the result
> > of half a million rows.
> > To unmarshall that result into the array took just over 13 seconds. The
> > unmarshalling is fairly well a straight field by field copy.
> > (Also I must add, I ran this on a local db copy on  my old steam driven
> > laptop, 32 bits and about 1G of memory.)
> >
(CORRECTED)
> > That's about 42 rows per mSec unmarshalling time or about 0.024 mSec per row.
>>

Well, 30 minutes does sound very excessive. Are you certain that it's the "unmarshalling" that is taking the time and not the execution of the query itself? That is why I separated the timings in my figures above.
Regarding your machine capability, my laptop is very similar to what you described (Single core, 1GB memory). The only real difference I can see is a 1.7GHtz maximum clock speed.
So I don't think that's the cause of the difference.  If I imagine your query on this PC I would expect about 200000 * 0.024 mSec to unmarshall it, say about 5 seconds.

Regarding using the memory based SQLite database approach, I wouldn't think that it would help. I don't know the actual "size" of the data returned by your query, but I would expect that you would get a major memory hit and a lot of paging by going that way.  I have used the memory SQLite database several times for manipulating several hundred or so records and it is quite fast but wouldn't even consider it for a dataset that large (and I guess it would be just adding another layer of processing to handle your query Result).

By the way, where is your source database? Is it on your machine or on a networked machine?  I had one of the lads in our office try the same thing that I did, but using the master database on our LAN. It took a bit longer, 38 seconds to execute the query rather than 31 so as I expected, network access to the database plays a fairly large part. ~20% for a query returning a set that large.  

Query optimisation?  We tend to use the Connection.Exec approach here for large queries as it let's us optimise both the database and the SQL for maximum benefit rather than rely on the
Gambas driver generated queries. (That's not a criticism by the way, its just that when dealing with large datasets our results have been better.)  For example, in the query I have been
talking about and using the timing, we create a temporary index on a boolean column that is one of the WHERE clause criteria, with the NULLS FIRST option set on the index.  Since we
are looking to select all the rows from that table where a flag (the "reconciled" column)  has not been set, they are all at the front of that index. As soon as the back end query engine hits an index entry for a row that has been reconciled it "knows" that it has finished. At the end of the query we just delete that index again.  Before I did that the query execution time was several minutes and now we are down to about 5 seconds (for the "real" query on the "real" database which returns up to 10000 rows). 

So again, I would looking for other causes of that massive time if I were you.

rgrds
b

-- 
B Bruen <adamnt42 at ...3379... (sort of)>




More information about the User mailing list