Discussion:
bug in 4.6.19 with cursors/duplicate keys
(too old to reply)
c***@gmail.com
2007-09-29 13:18:11 UTC
Permalink
Dear list & oracle guys,

i'm pretty sure that i've found a bug in a database setup with cursors
and duplicates. I can reproduce this in several of my tests; a test
program to reproduce the bug is attached, see below.

This is what happens:

I insert and delete a lot of keys; duplicates are enabled. The keys
are just integers, the records have a variable size. I use a global
cursor object, and my test has about 1000 database operations.

here's the most important sequence from the test program:

insert 545
// other stuff
insert 558
// other stuff
insert 545 // insert 1st duplicate
// other stuff
erase 545 // delete 1st duplicate
// other stuff
erase 545 // delete 2nd duplicate
fullcheck()

as you can see, two keys are inserted; 545 and 558. There are no keys
inserted > 545 and < 558, so these two keys immediately follow each
other.
545 has two duplicates, both are deleted with a cursor. 558 is never
deleted.

After these operations, i do what i call a "fullcheck" - i use a
cursor to enumerate all keys in the database. and this enumeration
does not find the key 558, although it's definitely in the database.
Instead, the cursor returns DB_NOTFOUND, although there are about 400
keys in the database with a value > 558, which were not yet found.

Why do i think that this is a bug? Three reasons:

1. First of all, the behaviour above is not correct.

2. Second, if you lookup the value 558 BEFORE calling fullcheck (with
cursor->c_get), everything works and all keys are found, not just all
keys < 545

3. If you change the database to be in-memory, it works, too. So
there's a big difference in the cursor behaviour of an in-memory
database and a file-based database.

My test environment is a 64bit linux PC:
***@neuromancer ~/prj/hamsterdb-tests/trunk/env/posix $ uname -a
Linux neuromancer 2.6.17-gentoo-r8 #1 SMP Sun Sep 24 21:19:36 UTC 2006
x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ AuthenticAMD GNU/
Linux

I tested with db 4.5.20, but i just downloaded 4.6.19, and it has the
same behaviour.


Regards
Chris

PS: here is the file: http://www.crupp.de/db.c
i compile it with
gcc -g db.c -o db libdb-4.6.a -lpthread -Wall

all relevant lines are marked with comments. feel free to contact me
if you have questions.
c***@gmail.com
2007-10-09 15:46:24 UTC
Permalink
*ping*

nobody here from oracle??
m***@gmail.com
2007-10-10 06:38:21 UTC
Permalink
Hi Chris,
Post by c***@gmail.com
i'm pretty sure that i've found a bug in a database setup with cursors
and duplicates. I can reproduce this in several of my tests; a test
program to reproduce the bug is attached, see below.
Thanks for the test case, I have taken a look and can see what is
going on.

The basic issue is that the cursor delete operation is not entirely
complete until the cursor is closed or moved (otherwise the cursor's
position is not well-defined after a delete). So this issue only
applies in between a cursor delete and the next operation on that
cursor, and then only if there is another cursor performing a scan in
the same thread.

I have a fix that is being reviewed, I'll let you know when it's done.

Regards,
Michael Cahill, Oracle.

P.S. We monitor the OTN forum much more actively than this newsgroup,
now:

http://forums.oracle.com/forums/forum.jspa?forumID=271

Loading...