Cursor read fail at exactly 40,000 record point

Discussion:

(too old to reply)

Gerard J. Nicol

2007-05-22 21:32:38 UTC

I have an application that is using Berkeley DB 4.5.20 under Windows
2003.

One particular database has around 200,000 records and regularly
creates a cursor, positions at starting points and reads records until
the records go out of scope.

In one particular instance the application reads a set of records that
amount to over 50,000 records. Usually all is OK, but every few days
the read starts returning only exactly 40,000 records for the same
set. This continues to occur until the application is restarted and
the environment renewed. When the application is restarted the correct
number of records is returned.

Can anybody suggest what could be causing this error?

Gerard

m***@gmail.com

2007-05-23 04:56:36 UTC

Permalink

Hi Gerard,

Post by Gerard J. Nicol
One particular database has around 200,000 records and regularly
creates a cursor, positions at starting points and reads records until
the records go out of scope.
In one particular instance the application reads a set of records that
amount to over 50,000 records. Usually all is OK, but every few days
the read starts returning only exactly 40,000 records for the same
set. This continues to occur until the application is restarted and
the environment renewed. When the application is restarted the correct
number of records is returned.

I don't understand why this would be happening. Can you give a few
more details:

* what type of database are you using (btree, hash, etc.)?
* is the database configured to support duplicates?
* how do you read the records (just cursor gets with DB_SET followed
by some number of DB_NEXT calls, or something more exotic)?
* how is the end of set determined (key goes out of scope, DB_NOTFOUND
at the end of the database, or some other condition)?

Regards,
Michael Cahill, Oracle.

Gerard J. Nicol

2007-05-23 09:24:32 UTC

Permalink

Michael,

I was getting no errors from the database (even with verbose message).
I have just tracked down the problem to:

(1) The database cache was set to 1 GB on a machine with 8 GB of RAM.
(2) It would appear that Windows limits the per address space storage
to 1 GB.
(3) Each record is 300 bytes.
(4) I use a custom array function that automatically grows (via
realloc). In this case it was set to grow by 10,000 records each time.

It would appear that the database was monopolizing most of the
available address space memory. When the array grew and the record set
got large the realloc was requiring a lot of memory for both the
source and destination.

The array function was returning a failure after the array failed to
grow but I was not catching this.

So in the end it was not the database environment as such. It was the
database environment sucking up the RAM and leaving insufficient
contiguous memory for the rest of the application.

A work around for the problem was to reduce the cache size. I will
also fix the code to read through the record set twice, once to count
the records and a second time to populate the array with a single
malloc. It would be nice if you could count the keys between a key
range without having to read every record!

Thanks for taking an interest.

Gerard

m***@gmail.com

2007-05-24 12:01:37 UTC

Permalink

Hi Gerard,

Post by Gerard J. Nicol
So in the end it was not the database environment as such. It was the
database environment sucking up the RAM and leaving insufficient
contiguous memory for the rest of the application.

You might want to investigate the "/3GB" switch when starting Windows:

http://technet.microsoft.com/en-us/library/e834e9c7-708c-43bf-b877-e14ae443ecbf.aspx

Alternatively, Windows X64 obviously doesn't have this restriction.

Post by Gerard J. Nicol
A work around for the problem was to reduce the cache size. I will
also fix the code to read through the record set twice, once to count
the records and a second time to populate the array with a single
malloc. It would be nice if you could count the keys between a key
range without having to read every record!

The DB_RECNUM flag to Berkeley DB can give you this count, but
maintaining the record numbers effectively prevents concurrent
updates.

Regards,
Michael.

Philip Guenther

2007-05-23 05:38:35 UTC

Permalink

What is the error returned by DBC->c_get() when it fails to return
another record?

Is the cursor inside a transaction? What flags are used when the
database environment
and involved database table are opened?

Philip Guenther