Discussion:
Improving whole-database scan performance
(too old to reply)
Florian Weimer
2007-02-14 09:24:08 UTC
Permalink
Is it possible to traverse a B-tree database (possibly in non-index
order, with read-comiitted isolation) in a way such that most accesses
happen in on-disk order? Due to internal fragmentation,
whole-database scans are annoyingly slow once the database size
exceeds available RAM.

Would be using DB_RECNO or a sequence-based key a suitable workaround?
Christopher Layne
2007-02-14 14:04:34 UTC
Permalink
Post by Florian Weimer
Is it possible to traverse a B-tree database (possibly in non-index
order, with read-comiitted isolation) in a way such that most accesses
happen in on-disk order? Due to internal fragmentation,
whole-database scans are annoyingly slow once the database size
exceeds available RAM.
Would be using DB_RECNO or a sequence-based key a suitable workaround?
What cache size are you using for the database? Default?
Florian Weimer
2007-02-14 18:05:03 UTC
Permalink
Post by Christopher Layne
Post by Florian Weimer
Is it possible to traverse a B-tree database (possibly in non-index
order, with read-comiitted isolation) in a way such that most accesses
happen in on-disk order? Due to internal fragmentation,
whole-database scans are annoyingly slow once the database size
exceeds available RAM.
Would be using DB_RECNO or a sequence-based key a suitable workaround?
What cache size are you using for the database? Default?
Uh-no, no, not quite. I've tried various values between 256 MB and
2.5 GB. And "db_dump -r" is acceptably fast (but plain db_dump
isn't).
Christopher Layne
2007-02-16 14:30:07 UTC
Permalink
Post by Florian Weimer
Post by Christopher Layne
Post by Florian Weimer
Is it possible to traverse a B-tree database (possibly in non-index
order, with read-comiitted isolation) in a way such that most accesses
happen in on-disk order? Due to internal fragmentation,
whole-database scans are annoyingly slow once the database size
exceeds available RAM.
Would be using DB_RECNO or a sequence-based key a suitable workaround?
What cache size are you using for the database? Default?
Uh-no, no, not quite. I've tried various values between 256 MB and
2.5 GB. And "db_dump -r" is acceptably fast (but plain db_dump
isn't).
Here is what I see in the source for db_dump.c (4.4.20):
Pay specific attention to the logic of "is_salvage" inside db_init(). rflag
determines if "is_salvage" is 1 or 0. Also, the default cache size within
db_dump is 1 MB.

[these are the only places rflag is relevant:]
[...]
/* Initialize the environment. */
if (db_init(dbenv, home, rflag, cache, &private) != 0)
goto err;

/* Create the DB object and open the file. */
if ((ret = db_create(&dbp, dbenv, 0)) != 0) {
dbenv->err(dbenv, ret, "db_create");
goto err;
}

/*
* If we're salvaging, don't do an open; it might not be safe.
* Dispatch now into the salvager.
*/
if (rflag) {
/* The verify method is a destructor. */
ret = dbp->verify(dbp, argv[0], NULL, stdout,
DB_SALVAGE |
(Rflag ? DB_AGGRESSIVE : 0) |
(pflag ? DB_PRINTABLE : 0));
dbp = NULL;
if (ret != 0)
goto err;
goto done;
}
[...]

[local db_init:]

/*
* db_init --
* Initialize the environment.
*/
int
db_init(dbenv, home, is_salvage, cache, is_privatep)
DB_ENV *dbenv;
char *home;
int is_salvage;
u_int32_t cache;
int *is_privatep;
{
int ret;

/*
* Try and use the underlying environment when opening a database.
* We wish to use the buffer pool so our information is as up-to-date
* as possible, even if the mpool cache hasn't been flushed.
*
* If we are not doing a salvage, we want to join the environment;
* if a locking system is present, this will let us use it and be
* safe to run concurrently with other threads of control. (We never
* need to use transactions explicitly, as we're read-only.) Note
* that in CDB, too, this will configure our environment
* appropriately, and our cursors will (correctly) do locking as CDB
* read cursors.
*
* If we are doing a salvage, the verification code will protest
* if we initialize transactions, logging, or locking; do an
* explicit DB_INIT_MPOOL to try to join any existing environment
* before we create our own.
*/
*is_privatep = 0;
if ((ret = dbenv->open(dbenv, home,
DB_USE_ENVIRON | (is_salvage ? DB_INIT_MPOOL : 0), 0)) == 0)
return (0);
if (ret == DB_VERSION_MISMATCH)
goto err;

/*
* An environment is required because we may be trying to look at
* databases in directories other than the current one. We could
* avoid using an environment iff the -h option wasn't specified,
* but that seems like more work than it's worth.
*
* No environment exists (or, at least no environment that includes
* an mpool region exists). Create one, but make it private so that
* no files are actually created.
*/
*is_privatep = 1;
if ((ret = dbenv->set_cachesize(dbenv, 0, cache, 1)) == 0 &&
(ret = dbenv->open(dbenv, home,
DB_CREATE | DB_INIT_MPOOL | DB_PRIVATE | DB_USE_ENVIRON, 0)) == 0)
return (0);

/* An environment is required. */
err: dbenv->err(dbenv, ret, "DB_ENV->open");
return (1);
}

Klaas
2007-02-15 01:55:40 UTC
Permalink
Post by Florian Weimer
Is it possible to traverse a B-tree database (possibly in non-index
order, with read-comiitted isolation) in a way such that most accesses
happen in on-disk order? Due to internal fragmentation,
whole-database scans are annoyingly slow once the database size
exceeds available RAM.
How much of your database is located in overflow pages?

-Mike
Florian Weimer
2007-02-15 09:42:19 UTC
Permalink
Post by Klaas
Post by Florian Weimer
Is it possible to traverse a B-tree database (possibly in non-index
order, with read-comiitted isolation) in a way such that most accesses
happen in on-disk order? Due to internal fragmentation,
whole-database scans are annoyingly slow once the database size
exceeds available RAM.
How much of your database is located in overflow pages?
Very little, the records are quite small. I just looked at one of the
smaller database files, and it hasn't got any overflow pages. (The
larger database files contain similarly structured records.)
Loading...