[Gambas-user] Size of a result in memory

Wed Oct 18 17:23:39 CEST 2017

On Wed, 18 Oct 2017, Christof Thalhofer wrote:
> Hello Tobias,
> 
> thank you for your great explanation.
> 
> Am 18.10.2017 um 11:05 schrieb Tobias Boege:
> 
> > So, you can't get the information you want, but the good news is that this
> > information wouldn't have helped you either way. Unless your DBMS is made
> > for tiny amounts of data, you shouldn't expect the dynamic memory allocated
> > by the Result object to be indicative of the size of the query result.
> > A Result object is usually just a small (almost fixed-size) object that is
> > associated to a database connection and a query and allows you to fetch or
> > navigate through the tuples. The DBMS loads the data on-demand, because
> > you would expect that the entirety of a database result would not fit into
> > your memory anyway.
> 
> But I think your assumption is not right.
> 
> Well, the reason for my answer was, that I wrote a Class that inherits
> gb.db.Connection and that is able to cache results.
> 
> It is for situations, where to fetch the result costs a lot of time and
> the result will be needed more than once. This works, I am able to cache
> results. To test it I wrote piece of code that starts the connection,
> after that fetches the result and the next time the same sql string is
> queried, it delivers the cached result:
> 
> MTestme.TesteDBCache.44: Init connection for later usage 906
> MTestme.TesteDBCache.51: Query1 5000 tuples duration in ms 7397
> MTestme.TesteDBCache.61: Cached query 5000 tuples duration in ms 0
> 
> You see that in Position 3 duration in ms = 0. The result comes out of
> the cache (a collection with the sql query string as key).
> 
> Now I have changed the code to do a simple
> Connection.Exec(qry):
> 
> MTestme.TesteDBCache.44: Init connection for later usage 784
> MTestme.TesteDBCache.51: Query1 5000 tuples duration in ms 6228
> MTestme.TesteDBCache.61: identical Query duration in ms 5365
> 
> Here you see that in position 3 the result is fetched again from the db
> and that operation costs an amount of time (5365 ms).
> 
> ---------
> 
> For me the conclusion is, that the result fetches the 5000 tuples once
> and then they are inside the result object on the local computer fol
> later usage.
> 

You know, I was about to write that the result object, or the library
that is used by the database driver in Gambas, probably only does some
prefetching of a fixed number > 1 of results to reduce the overhead
incurred by network communication. But you were right: I was only assuming
what I said last time and since I have a table at hand, in a local SQLite3
database, which is 5.5 GiB in size, I thought I could provide numbers as
well. I wrote a Gambas program that connects to the database and does a
"SELECT * FROM table", then iterates through the entire result.

Guess what, watching its memory usage in htop, I see that it consumes
about 2 GiB, then segfaults. This is not what I expected, even having
the SQLite3 limits[1] in mind. In fact, my perl6 program, which usually
operates this database, has no trouble doing exactly the same thing,
and it uses a constant amount of memory the whole time.

I would dare to say that there is something wrong in some part of gb.db.
I'm curious about that but my leisure time for today was used up by these
experiments.

> As I use these cached results a lot in my programs I would like to
> delete old cached results if an upper limit of memory usage is reached.
> 
> But therefore I have to know, how "heavy" a result is.
> 

One thing you can do is go through each field in each record of the Result,
and sum their sizes (since you measure the size of an Integer differently
than that of a String or Blob, you use a SELECT CASE on the field's type).
This will be slow, but it seems to be more or less what people suggest[2][3].
Another suggestion[4], there specific to MySQL, needs to create another table
to store the result and exploit that the DBMS does track the size of tables.
This is surely not an option for you.

My proposal would be this: take the *row count* as an approximation to the
cache size and limit the cache in terms of records. You can easily[*] get
the row count of a result and maintain the number of rows stored in your
cache.

Regards,
Tobi

[1] https://sqlite.org/limits.html
[2] https://stackoverflow.com/questions/4524019/how-to-get-the-byte-size-of-resultset-in-an-sql-query
[3] https://dba.stackexchange.com/questions/156362/how-to-get-the-size-of-the-result-of-a-sql-query
[4] https://stackoverflow.com/questions/22695288/is-there-a-way-to-calculate-the-size-of-a-mysql-query-response

[*] Unless you're using an ODBC database, because those give you forward-
    only cursors in Gambas and you have to take care to do the counting
    at the same time you read the data into your cache, because there is
    no going back. (But note that this remark comes from my memory of what
    has been discussed on gambas-{user,devel}, it might not be accurate.)

-- 
"There's an old saying: Don't change anything... ever!" -- Mr. Monk