Discussion:
Very slow record-retrieval through cursor
(too old to reply)
Rabbit
2007-08-30 07:54:36 UTC
Permalink
Hi,

I use Berkeley-DB to manage more than 250 dbs and every db has the
same structure (16 bytes key and 8 bytes data) and almost the same
acount of records (about 500,000). These dbs are opened in the same
environment with transaction enabled. When initialization, I have 4
threads to retrieve data from them through cursor simultaneously but
no two thread will operate on the same db (thread 1 reads db 0,
4, ...; thread 2 reads db 1, 5, ...; etc.). After a while, the cursor
read process becomes very slow, it need more than 20 seconds to finish
the read from a db.

My cache size is 4G and I use the default page size which is 4K.
Following is part of my DB_CONFIG:

set_cachesize 4 0 4
mutex_set_max 1572864
set_lk_max_lockers 4096
set_lk_max_objects 2048
set_lk_max_locks 8388608

I also used "valgrind --tool=callgrind" to profile my program. I got
the following result (I'm not familiar with valgrind so I don't know
how to can get a better profiling result):

1,741,812,564 ???:__memp_get_bucket [libdb-4.6.so]
1,727,316,262 ???:__db_tas_mutex_lock [libdb-4.6.so]
959,780,404 ???:__memp_fget [libdb-4.6.so]
888,693,549 ???:__env_alloc [libdb-4.6.so]
882,790,067 ???:__dbc_get [libdb-4.6.so]
798,835,644 ???:__db_tas_mutex_unlock [libdb-4.6.so]
722,079,676 ???:__os_id [libdb-4.6.so]
685,849,568 ???:__ham_func4 [libdb-4.6.so]
681,554,232 ???:__db_retcopy [libdb-4.6.so]
659,823,410 ???:__db_cursor_int [libdb-4.6.so]
526,604,190 ???:__memp_fput [libdb-4.6.so]
505,002,137 // my function
492,475,941 ???:__lock_get_internal [libdb-4.6.so]
455,045,660 ???:__db_ret [libdb-4.6.so]
451,725,119 ???:__db_lget [libdb-4.6.so]
449,867,405 ???:__bamc_next [libdb-4.6.so]
448,777,043 ???:__lock_getobj [libdb-4.6.so]
400,442,895 ???:__dbc_close [libdb-4.6.so]
359,487,207 ???:__dbc_get_arg [libdb-4.6.so]
348,992,625 // ...........
332,184,228 ???:__dbc_idup [libdb-4.6.so]
327,633,228 ???:__bamc_get [libdb-4.6.so]
323,084,790 ???:__lock_put [libdb-4.6.so]
313,985,666 ???:memcpy [/lib/libc-2.3.6.so]
295,780,327 ???:__dbc_get_pp [libdb-4.6.so]
286,679,279 ???:__dbc_cleanup [libdb-4.6.so]
285,409,596 ???:__lock_open [libdb-4.6.so]
284,264,251 ???:__logc_get_int [libdb-4.6.so]
277,579,162 ???:__bamc_close [libdb-4.6.so]
266,029,764 ???:pthread_self [/lib/libpthread-2.3.6.so]
245,727,648 ???:__lock_get [libdb-4.6.so]
213,873,198 ???:__bamc_refresh [libdb-4.6.so]
180,765,978 ???:__lock_put_internal [libdb-4.6.so]
178,388,632 // ...........
133,158,636 // ...........
128,302,983 // ...........
118,312,090 ???:__dbt_userfree [libdb-4.6.so]
109,718,273 ???:0x000000000006C270 [/lib/libc-2.3.6.so]
107,838,186 ???:__db_check_chksum [libdb-4.6.so]
101,720,408 ???:__db_pthread_mutex_init [libdb-4.6.so]

Someone can give me some advice? Thanks a lot!
Don Anderson
2007-08-30 18:58:16 UTC
Permalink
Hello,

I don't know valgrind either, but I know quite a bit about BDB. You
might want to ask
your question on the Oracle forums for BDB, in the meantime, here are
some thoughts.

You say 'After a while, the cursor read process becomes very slow'. I
guess that means
it is running fast until then? It sounds like you are doing all read
operations during this time?
If not, there may be some write operations holding locks - possible
contention. If you
are doing writing, are you using transactions?

Has your cache been prewarmed - that is, is there reason to believe
that the operations
before the slowdown are running fast because the cache is already
filled with pages
from previous accesses? A quick calculation shows that raw data size
alone:
(16+8) * 500,000 * 250 => 3G.
With typical overhead and page fill, the database sizes may be more
like 5 or 6 G (BDB
generally sacrifices space for speed) so if your testing is 'random'
and you are doing
complete database scans, it probably won't fit in 4G cache.

If it isn't a cache issue, then maybe there's a resource that is
leaking. Are all cursors
closed when finished, and temporary memory freed? You'd probably see
process size
starting to balloon if that is what is going on. Perhaps there's some
other resource
contention between the threads. What happens if you run with fewer
(or even one) thread?

Answers to any of these could be helpful clues. Another thing to look
at is DB statistics.
The first one to look at is the output from db_stat -m, which would
tell you about the cache,
and db_stat -Co, which can give some clues about contention. Run
these after the program
slows down. The output can be big, so capture it in a file.

- Don Anderson
Post by Rabbit
Hi,
I use Berkeley-DB to manage more than 250 dbs and every db has the
same structure (16 bytes key and 8 bytes data) and almost the same
acount of records (about 500,000). These dbs are opened in the same
environment with transaction enabled. When initialization, I have 4
threads to retrieve data from them through cursor simultaneously but
no two thread will operate on the same db (thread 1 reads db 0,
4, ...; thread 2 reads db 1, 5, ...; etc.). After a while, the cursor
read process becomes very slow, it need more than 20 seconds to finish
the read from a db.
My cache size is 4G and I use the default page size which is 4K.
set_cachesize 4 0 4
mutex_set_max 1572864
set_lk_max_lockers 4096
set_lk_max_objects 2048
set_lk_max_locks 8388608
I also used "valgrind --tool=callgrind" to profile my program. I got
the following result (I'm not familiar with valgrind so I don't know
1,741,812,564 ???:__memp_get_bucket [libdb-4.6.so]
1,727,316,262 ???:__db_tas_mutex_lock [libdb-4.6.so]
959,780,404 ???:__memp_fget [libdb-4.6.so]
888,693,549 ???:__env_alloc [libdb-4.6.so]
882,790,067 ???:__dbc_get [libdb-4.6.so]
798,835,644 ???:__db_tas_mutex_unlock [libdb-4.6.so]
722,079,676 ???:__os_id [libdb-4.6.so]
685,849,568 ???:__ham_func4 [libdb-4.6.so]
681,554,232 ???:__db_retcopy [libdb-4.6.so]
659,823,410 ???:__db_cursor_int [libdb-4.6.so]
526,604,190 ???:__memp_fput [libdb-4.6.so]
505,002,137 // my function
492,475,941 ???:__lock_get_internal [libdb-4.6.so]
455,045,660 ???:__db_ret [libdb-4.6.so]
451,725,119 ???:__db_lget [libdb-4.6.so]
449,867,405 ???:__bamc_next [libdb-4.6.so]
448,777,043 ???:__lock_getobj [libdb-4.6.so]
400,442,895 ???:__dbc_close [libdb-4.6.so]
359,487,207 ???:__dbc_get_arg [libdb-4.6.so]
348,992,625 // ...........
332,184,228 ???:__dbc_idup [libdb-4.6.so]
327,633,228 ???:__bamc_get [libdb-4.6.so]
323,084,790 ???:__lock_put [libdb-4.6.so]
313,985,666 ???:memcpy [/lib/libc-2.3.6.so]
295,780,327 ???:__dbc_get_pp [libdb-4.6.so]
286,679,279 ???:__dbc_cleanup [libdb-4.6.so]
285,409,596 ???:__lock_open [libdb-4.6.so]
284,264,251 ???:__logc_get_int [libdb-4.6.so]
277,579,162 ???:__bamc_close [libdb-4.6.so]
266,029,764 ???:pthread_self [/lib/libpthread-2.3.6.so]
245,727,648 ???:__lock_get [libdb-4.6.so]
213,873,198 ???:__bamc_refresh [libdb-4.6.so]
180,765,978 ???:__lock_put_internal [libdb-4.6.so]
178,388,632 // ...........
133,158,636 // ...........
128,302,983 // ...........
118,312,090 ???:__dbt_userfree [libdb-4.6.so]
109,718,273 ???:0x000000000006C270 [/lib/libc-2.3.6.so]
107,838,186 ???:__db_check_chksum [libdb-4.6.so]
101,720,408 ???:__db_pthread_mutex_init [libdb-4.6.so]
Someone can give me some advice? Thanks a lot!
Rabbit
2007-08-31 02:12:53 UTC
Permalink
Hi Don,

Thanks for your help! It's during the initialization period of my
program so there is no writing operations. I put detailed answers
below.
Post by Don Anderson
Hello,
I don't know valgrind either, but I know quite a bit about BDB. You
might want to ask
your question on the Oracle forums for BDB, in the meantime, here are
some thoughts.
You say 'After a while, the cursor read process becomes very slow'. I
guess that means
it is running fast until then? It sounds like you are doing all read
operations during this time?
It becomes slow very quickly, after reading couple of dbs.
Yes, all read operations.
Post by Don Anderson
If not, there may be some write operations holding locks - possible
contention. If you
are doing writing, are you using transactions?
No writing at this time.
Post by Don Anderson
Has your cache been prewarmed - that is, is there reason to believe
that the operations
before the slowdown are running fast because the cache is already
filled with pages
from previous accesses? A quick calculation shows that raw data size
(16+8) * 500,000 * 250 => 3G.
With typical overhead and page fill, the database sizes may be more
like 5 or 6 G (BDB
generally sacrifices space for speed) so if your testing is 'random'
and you are doing
complete database scans, it probably won't fit in 4G cache.
I think the cache is not prewarmed (how to prewarm it?).
I noticed a phenomena that, if I stop the program during the
initializaiton and start it again,
it will be very fast before it reaches where it's stopped last time.
After that, the db read becomes
slow again.
If I clean the evironment (just delete those __db* files), the db read
becomes slow very quickly.

The total physical files' size of these dbs is about 7.6G.
Yes I'm doing a complete database scan but I don't know whether it is
random.
I just put a DB_NEXT flag (sequence scan?) to the cursor and keep
reading until finished.

In my thought, the cache won't fit all the data, but is enough for one
db.
Post by Don Anderson
If it isn't a cache issue, then maybe there's a resource that is
leaking. Are all cursors
closed when finished, and temporary memory freed?
Yes
Post by Don Anderson
You'd probably see process size
starting to balloon if that is what is going on. Perhaps there's some
other resource
contention between the threads. What happens if you run with fewer
(or even one) thread?
It's also very slow even I run with only one thread.
If I clean the environment, it's very fast when read the first 6 or 7
dbs, but becomes slow after that.
Post by Don Anderson
Answers to any of these could be helpful clues. Another thing to look
at is DB statistics.
The first one to look at is the output from db_stat -m, which would
tell you about the cache,
The head part of db_stat -m:

4GB Total cache size
4 Number of caches
4 Maximum number of caches
1GB Pool individual cache size
0 Maximum memory-mapped file size
0 Maximum open file descriptors
0 Maximum sequential buffer writes
0 Sleep after writing maximum sequential buffers
0 Requested pages mapped into the process' address space
24M Requested pages found in the cache (98%)
360522 Requested pages not found in the cache
0 Pages created in the cache
360521 Pages read into the cache
0 Pages written from the cache to the backing file
0 Clean pages forced from the cache
0 Dirty pages forced from the cache
0 Dirty pages written by trickle-sync thread
360510 Current total page count
360510 Current clean page count
0 Current dirty page count
524284 Number of hash buckets used for page location
24M Total number of times hash chains searched for a page
(24509250)
11 The longest hash chain searched for a page
28M Total number of hash chain entries checked for page (28559440)
0 The number of hash bucket locks that required waiting (0%)
0 The maximum number of times any hash bucket lock was waited
for (0%)
0 The number of region locks that required waiting (0%)
0 The number of buffers frozen
0 The number of buffers thawed
0 The number of frozen buffers freed
361545 The number of page allocations
0 The number of hash buckets examined during allocations
0 The maximum number of hash buckets examined for an allocation
0 The number of pages examined during allocations
0 The max number of pages examined for an allocation
0 Threads waited on page I/O
Post by Don Anderson
and db_stat -Co, which can give some clues about contention. Run
these after the program
slows down. The output can be big, so capture it in a file.
The output of db_stat -Co:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock REGINFO information:
Lock Region type
8 Region ID
data/__db.008 Region name
0x2affad7b6000 Original region address
0x2affad7b6000 Region address
0x2affad7b6068 Region primary address
0 Region maximum allocation
0 Region allocated
Region allocations: 8394757 allocations, 0 failures, 0 frees, 1
longest
REGION_JOIN_OK Region flags
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by object:
Locker Mode Count Status ----------------- Object
---------------
17d READ 1 HELD #040.db handle 0

10b READ 1 HELD #002.db handle 0

114 READ 1 HELD #005.db handle 0

19c READ 1 HELD #050.db page 3987

126 READ 1 HELD #011.db handle 0

117 READ 1 HELD #006.db handle 0

165 READ 1 HELD #032.db handle 0

19c READ 1 HELD #050.db page 2584

10e READ 1 HELD #003.db handle 0

13e READ 1 HELD #019.db handle 0

171 READ 1 HELD #036.db handle 0

12f READ 1 HELD #014.db handle 0

15f READ 1 HELD #030.db handle 0

144 READ 1 HELD #021.db handle 0

186 READ 1 HELD #043.db handle 0

162 READ 1 HELD #031.db handle 0

12c READ 1 HELD #013.db handle 0

177 READ 1 HELD #038.db handle 0

174 READ 1 HELD #037.db handle 0

108 READ 1 HELD #001.db handle 0

105 READ 1 HELD #000.db handle 0

17a READ 1 HELD #039.db handle 0

13b READ 1 HELD #018.db handle 0

189 READ 1 HELD #044.db handle 0

102 READ 1 HELD sys.db handle
0

180 READ 1 HELD #041.db handle 0

156 READ 1 HELD #027.db handle 0

120 READ 1 HELD #009.db handle 0

159 READ 1 HELD #028.db handle 0

132 READ 1 HELD #015.db handle 0

138 READ 1 HELD #017.db handle 0

18f READ 1 HELD #046.db handle 0

123 READ 1 HELD #010.db handle 0

14d READ 1 HELD #024.db handle 0

19b READ 1 HELD #050.db handle 0

11d READ 1 HELD #008.db handle 0

168 READ 1 HELD #033.db handle 0

129 READ 1 HELD #012.db handle 0

198 READ 1 HELD #049.db handle 0

141 READ 1 HELD #020.db handle 0

135 READ 1 HELD #016.db handle 0

153 READ 1 HELD #026.db handle 0

147 READ 1 HELD #022.db handle 0

111 READ 1 HELD #004.db handle 0

150 READ 1 HELD #025.db handle 0

195 READ 1 HELD #048.db handle 0

11a READ 1 HELD #007.db handle 0

15c READ 1 HELD #029.db handle 0

192 READ 1 HELD #047.db handle 0

18c READ 1 HELD #045.db handle 0

16e READ 1 HELD #035.db handle 0

16b READ 1 HELD #034.db handle 0

14a READ 1 HELD #023.db handle 0

183 READ 1 HELD #042.db handle 0
Post by Don Anderson
- Don Anderson
Rabbit
2007-08-31 02:50:14 UTC
Permalink
Hi,

I've got another better profiling result:

436,042,669,130 * ???:__dbc_get_pp [libdb-4.6.so]
3,340,463,958 > ???:__dbt_userfree (128479383x) [libdb-4.6.so]
412,916,377,218 > ???:__dbc_get (128479383x) [libdb-4.6.so]
1,430 > ???:_dl_runtime_resolve (2x) [/lib/ld-2.3.6.so]
11,434,668,171 > ???:__dbc_get_arg (128479383x) [libdb-4.6.so]

412,916,377,218 < ???:__dbc_get_pp (128479383x) [libdb-4.6.so]
412,916,377,218 * ???:__dbc_get [libdb-4.6.so]
142,030,630,667 > ???:__dbc_cleanup (128479383x) [libdb-4.6.so]
153,069,782,408 > ???:__bamc_get (128479380x) [libdb-4.6.so]
33,918,583,319 > ???:__db_ret (256958252x) [libdb-4.6.so]
58,972,371,752 > ???:__dbc_idup (128479383x) [libdb-4.6.so]
1,379 > ???:_dl_runtime_resolve (2x) [/lib/ld-2.3.6.so]
22,554 > ???:__hamc_get (3x) [libdb-4.6.so]

153,069,782,408 < ???:__dbc_get (128479380x) [libdb-4.6.so]
153,069,782,408 * ???:__bamc_get [libdb-4.6.so]
143,816,284,749 > ???:__bamc_next (128479124x) [libdb-4.6.so]
2,987,931 > ???:__bamc_search (256x) [libdb-4.6.so]

143,816,284,749 < ???:__bamc_get (128479124x) [libdb-4.6.so]
143,816,284,749 * ???:__bamc_next [libdb-4.6.so]
62,037,643,630 > ???:__db_lget (130424302x) [libdb-4.6.so]
68,167,200,986 > ???:__memp_fget (130424302x) [libdb-4.6.so]
909,615,699 > ???:__memp_fput (1945178x) [libdb-4.6.so]

142,030,630,667 < ???:__dbc_get (128479383x) [libdb-4.6.so]
142,030,630,667 * ???:__dbc_cleanup [libdb-4.6.so]
73,857,483,410 > ???:__dbc_close (128479383x) [libdb-4.6.so]
60,078,946,392 > ???:__memp_fput (128479382x) [libdb-4.6.so]

37,647 < ???:__lock_get_internal (267x) [libdb-4.6.so]
152,224 < ???:__log_put (2144x) [libdb-4.6.so]
142 < ???:__dbreg_close_files (2x) [libdb-4.6.so]
71 < ???:__txn_preclose (1x) [libdb-4.6.so]
18,247 < ???:__dbreg_setup (257x) [libdb-4.6.so]
18,244,145,374 < ???:__db_cursor_int (256959794x) [libdb-4.6.so]
18,247 < ???:__db_refresh (257x) [libdb-4.6.so]
66,927,770 < ???:__memp_bhfree (935478x) [libdb-4.6.so]
40,754 < ???:__mutex_free_int (574x) [libdb-4.6.so]
9,272,281,937 < ???:__lock_get (128480158x) [libdb-4.6.so]
71 < ???:__lock_open (1x) [libdb-4.6.so]
710 < ???:__memp_sync_int (10x) [libdb-4.6.so]
207,498,639 < ???:__memp_alloc (2882711x) [libdb-4.6.so]
71 < ???:__log_env_refresh (1x) [libdb-4.6.so]
18,247 < ???:__db_env_setup (257x) [libdb-4.6.so]
1,420 < ???:__txn_getckp (20x) [libdb-4.6.so]
12,070 < ???:__memp_walk_files (170x) [libdb-4.6.so]
36,494 < ???:__memp_fclose (514x) [libdb-4.6.so]
54,741 < ???:__lock_id_free (771x) [libdb-4.6.so]
18,247 < ???:__dbreg_close_id (257x) [libdb-4.6.so]
142 < ???:__log_vtruncate (2x) [libdb-4.6.so]
710 < ???:__memp_close_flush_files (10x) [libdb-4.6.so]
18,247 < ???:__dbreg_add_dbentry (257x) [libdb-4.6.so]
74,525,923 < ???:__mutex_alloc_int (1049635x) [libdb-4.6.so]
36,494 < ???:__dbc_destroy (514x) [libdb-4.6.so]
1,035,713 < ???:__lock_getlocker (257x) [libdb-4.6.so]
710 < ???:__txn_updateckp (10x) [libdb-4.6.so]
18,244,145,374 < ???:__dbc_close (256959794x) [libdb-4.6.so]
36,494 < ???:__txn_begin_int (514x) [libdb-4.6.so]
18,247 < ???:__dbreg_rem_dbentry (257x) [libdb-4.6.so]
18,247 < ???:__db_create_internal (257x) [libdb-4.6.so]
18,247 < ???:__lock_freefamilylocker (257x) [libdb-4.6.so]
18,247 < ???:__dbreg_teardown_int (257x) [libdb-4.6.so]
138,180,555 < ???:__memp_pgread (1946205x) [libdb-4.6.so]
923 < ???:__dbreg_revoke_id_int (13x) [libdb-4.6.so]
2,908 < ???:__txn_checkpoint (20x) [libdb-4.6.so]
71 < ???:__memp_open (1x) [libdb-4.6.so]
36,494 < ???:__txn_end (514x) [libdb-4.6.so]
710 < ???:__dbreg_log_files (10x) [libdb-4.6.so]
71 < ???:__mutex_open (1x) [libdb-4.6.so]
1,491 < ???:__log_current_lsn (21x) [libdb-4.6.so]
18,247 < ???:__lock_downgrade (257x) [libdb-4.6.so]
276,366,170 < ???:__memp_fget (3892410x) [libdb-4.6.so]
139,734,353 < ???:__lock_vec (1946718x) [libdb-4.6.so]
72,988 < ???:__memp_fopen (1028x) [libdb-4.6.so]
39,192 < ???:__os_openhandle (552x) [libdb-4.6.so]
18,247 < ???:__txn_dref_fname (257x) [libdb-4.6.so]
7,526 < ???:__memp_mf_discard (106x) [libdb-4.6.so]
18,176 < ???:__log_check_page_lsn (256x) [libdb-4.6.so]
710 < ???:__txn_getactive (10x) [libdb-4.6.so]
39,192 < ???:__os_closehandle (552x) [libdb-4.6.so]
123,076 < ???:__lock_id (771x) [libdb-4.6.so]
11,159,090,885 < ???:__lock_put (128479900x) [libdb-4.6.so]
71 < ???:__log_get_cached_ckp_lsn (1x) [libdb-4.6.so]
18,247 < ???:__db_close (257x) [libdb-4.6.so]
71 < ???:__env_ref_decrement (1x) [libdb-4.6.so]
1,349 < ???:__log_flush_int (19x) [libdb-4.6.so]
18,658,577,983 < ???:__memp_get_bucket (262796873x) [libdb-4.6.so]
2,130 < ???:__logc_get_int (30x) [libdb-4.6.so]
18,247 < ???:__dbreg_new_id (257x) [libdb-4.6.so]
76,483,506,001 * ???:__db_tas_mutex_lock [libdb-4.6.so]
976 > ???:_dl_runtime_resolve (1x) [/lib/ld-2.3.6.so]
124,222,870 > ???:__db_pthread_mutex_lock (1046173x)
[libdb-4.6.so]
27,204,912,168 > ???:__os_id (1046342705x) [libdb-4.6.so]

73,857,483,410 < ???:__dbc_cleanup (128479383x) [libdb-4.6.so]
187,467 < ???:__dbc_close_pp (257x) [libdb-4.6.so]
329 < ???:__ham_open (1x) [libdb-4.6.so]
83,456 < ???:__bam_read_root (256x) [libdb-4.6.so]
73,857,754,662 * ???:__dbc_close [libdb-4.6.so]
18,244,145,374 > ???:__db_tas_mutex_lock (256959794x)
[libdb-4.6.so]
38,911,220,630 > ???:__bamc_close (128479892x) [libdb-4.6.so]
1,277 > ???:__hamc_close (5x) [libdb-4.6.so]
5,396,155,674 > ???:__db_tas_mutex_unlock (256959794x)
[libdb-4.6.so]

7,421 < ???:__ham_get_meta (4x) [libdb-4.6.so]
1,062,594 < ???:__bam_search (512x) [libdb-4.6.so]
531,392 < ???:__bam_get_root (256x) [libdb-4.6.so]
68,167,200,986 < ???:__bamc_next (130424302x) [libdb-4.6.so]
531,177 < ???:__bam_read_root (256x) [libdb-4.6.so]
5,348 < ???:__ham_get_cpage (4x) [libdb-4.6.so]
68,169,338,918 * ???:__memp_fget [libdb-4.6.so]
276,366,170 > ???:__db_tas_mutex_lock (3892410x) [libdb-4.6.so]
87,579,225 > ???:memset (1946205x) [/lib/libc-2.3.6.so]
1,537 > ???:_dl_runtime_resolve (2x) [/lib/ld-2.3.6.so]
34,319,202,545 > ???:__memp_get_bucket (134317744x) [libdb-4.6.so]
533,263,440 > ???:__memp_pgread (1946205x) [libdb-4.6.so]
2,861,542,929 > ???:__db_tas_mutex_unlock (136263949x)
[libdb-4.6.so]
995,536,481 > ???:__memp_alloc (1946205x) [libdb-4.6.so]

34,319,202,545 < ???:__memp_fget (134317744x) [libdb-4.6.so]
33,519,312,242 < ???:__memp_fput (130425334x) [libdb-4.6.so]
67,838,514,787 * ???:__memp_get_bucket [libdb-4.6.so]
18,658,577,983 > ???:__db_tas_mutex_lock (262796873x)
[libdb-4.6.so]

2,707 < ???:__ham_get_meta (4x) [libdb-4.6.so]
854,562 < ???:__bam_search (512x) [libdb-4.6.so]
161,694 < ???:__bam_get_root (256x) [libdb-4.6.so]
62,037,643,630 < ???:__bamc_next (130424302x) [libdb-4.6.so]
3,254 < ???:__ham_lock_bucket (4x) [libdb-4.6.so]
182,604 < ???:__bam_read_root (256x) [libdb-4.6.so]
62,038,848,451 * ???:__db_lget [libdb-4.6.so]
48,001,879,006 > ???:__lock_get (128479644x) [libdb-4.6.so]
1,282,589,718 > ???:__lock_vec (1945690x) [libdb-4.6.so]

60,078,946,392 < ???:__dbc_cleanup (128479382x) [libdb-4.6.so]
239,324 < ???:__bam_search (512x) [libdb-4.6.so]
909,615,699 < ???:__bamc_next (1945178x) [libdb-4.6.so]
1,568 < ???:__ham_release_meta (4x) [libdb-4.6.so]
120,079 < ???:__bam_read_root (256x) [libdb-4.6.so]
784 < ???:__hamc_get (2x) [libdb-4.6.so]
60,988,923,846 * ???:__memp_fput [libdb-4.6.so]
10,296,405,985 > ???:__memp_bucket_reorder (86843075x)
[libdb-4.6.so]
904 > ???:_dl_runtime_resolve (1x) [/lib/ld-2.3.6.so]
33,519,312,242 > ???:__memp_get_bucket (130425334x) [libdb-4.6.so]
2,738,932,014 > ???:__db_tas_mutex_unlock (130425334x)
[libdb-4.6.so]

58,972,371,752 < ???:__dbc_get (128479383x) [libdb-4.6.so]
58,972,371,752 * ???:__dbc_idup [libdb-4.6.so]
48,308,581,495 > ???:__db_cursor_int (128479383x) [libdb-4.6.so]
54 > ???:__hamc_dup (3x) [libdb-4.6.so]
1,284,793,800 > ???:__bamc_dup (128479380x) [libdb-4.6.so]
1,453 > ???:_dl_runtime_resolve (2x) [/lib/ld-2.3.6.so]

48,308,581,495 < ???:__dbc_idup (128479383x) [libdb-4.6.so]
606,280 < ???:__db_cursor (514x) [libdb-4.6.so]
48,309,187,775 * ???:__db_cursor_int [libdb-4.6.so]
282,390 > ???:__os_calloc (514x) [libdb-4.6.so]
2,221 > ???:__hamc_init (2x) [libdb-4.6.so]
6,038,558,508 > ???:__bamc_refresh (128479892x) [libdb-4.6.so]
11,565 > ???:__txn_record_fname (257x) [libdb-4.6.so]
182,332 > ???:__bamc_init (512x) [libdb-4.6.so]
18,244,145,374 > ???:__db_tas_mutex_lock (256959794x)
[libdb-4.6.so]
219,458 > ???:__lock_id (514x) [libdb-4.6.so]
771 > ???:__lock_set_thread_id (257x) [libdb-4.6.so]
4,380 > ???:_dl_runtime_resolve (5x) [/lib/ld-2.3.6.so]
6,682 > ???:__os_id (257x) [libdb-4.6.so]
5,396,155,674 > ???:__db_tas_mutex_unlock (256959794x)
[libdb-4.6.so]

213,866 < ???:__fop_lock_handle (257x) [libdb-4.6.so]
48,001,879,006 < ???:__db_lget (128479644x) [libdb-4.6.so]
156,410 < ???:__fop_file_setup (257x) [libdb-4.6.so]
48,002,249,282 * ???:__lock_get [libdb-4.6.so]
9,272,281,937 > ???:__db_tas_mutex_lock (128480158x)
[libdb-4.6.so]
28,524,858,133 > ???:__lock_get_internal (128480158x)
[libdb-4.6.so]
986 > ???:_dl_runtime_resolve (1x) [/lib/ld-2.3.6.so]
3,267,179,694 > ???:__db_tas_mutex_unlock (128480158x)
[libdb-4.6.so]

38,911,220,630 < ???:__dbc_close (128479892x) [libdb-4.6.so]
38,911,220,630 * ???:__bamc_close [libdb-4.6.so]
31,073,952,338 > ???:__db_lput (128479380x) [libdb-4.6.so]

33,918,583,319 < ???:__dbc_get (256958252x) [libdb-4.6.so]
33,918,583,319 * ???:__db_ret [libdb-4.6.so]
21,070,670,659 > ???:__db_retcopy (256958252x) [libdb-4.6.so]

1,001 < ???:__ham_item_init (3x) [libdb-4.6.so]
1,194 < ???:__ham_release_meta (4x) [libdb-4.6.so]
31,073,952,338 < ???:__bamc_close (128479380x) [libdb-4.6.so]
219 < ???:__ham_get_cpage (1x) [libdb-4.6.so]
31,073,954,752 * ???:__db_lput [libdb-4.6.so]
28,632,846,378 > ???:__lock_put (128479387x) [libdb-4.6.so]

28,524,858,133 < ???:__lock_get (128480158x) [libdb-4.6.so]
734,419,208 < ???:__lock_vec (1945690x) [libdb-4.6.so]
29,259,277,341 * ???:__lock_get_internal [libdb-4.6.so]
37,647 > ???:__db_tas_mutex_lock (267x) [libdb-4.6.so]
42 > ???:__lock_expired (3x) [libdb-4.6.so]
117,711 > ???:__mutex_alloc (261x) [libdb-4.6.so]
12,745,636,637 > ???:__lock_getobj (130425848x) [libdb-4.6.so]
2,608,523,646 > ???:__lock_ohash (130425848x) [libdb-4.6.so]
1,594 > ???:_dl_runtime_resolve (2x) [/lib/ld-2.3.6.so]
63 > ???:__db_tas_mutex_unlock (3x) [libdb-4.6.so]

242,010 < ???:__bam_read_root (256x) [libdb-4.6.so]
28,632,846,378 < ???:__db_lput (128479387x) [libdb-4.6.so]
102,460 < ???:__fop_file_setup (257x) [libdb-4.6.so]
28,633,190,848 * ???:__lock_put [libdb-4.6.so]
11,159,090,885 > ???:__db_tas_mutex_lock (128479900x)
[libdb-4.6.so]
5,088,247,444 > ???:__lock_put_internal (128479900x)
[libdb-4.6.so]
3,263,779,619 > ???:__db_tas_mutex_unlock (128479900x)
[libdb-4.6.so]

27,204,912,168 < ???:__db_tas_mutex_lock (1046342705x)
[libdb-4.6.so]
6,682 < ???:__db_cursor_int (257x) [libdb-4.6.so]
6,682 < ???:__txn_begin_int (257x) [libdb-4.6.so]
1,842 < ???:db_env_create (1x) [libdb-4.6.so]
8,840 < ???:__mutex_alloc_int (520x) [libdb-4.6.so]
17 < ???:__os_unique_id (1x) [libdb-4.6.so]
26,728 < ???:__lock_getlocker_int (1028x) [libdb-4.6.so]
27,204,962,959 * ???:__os_id [libdb-4.6.so]
7,324,409,729 > ???:pthread_self (1046344247x) [/lib/
libc-2.3.6.so]
3,657 > ???:_dl_runtime_resolve (2x) [/lib/ld-2.3.6.so]
5 > ???:getpid (1x) [/lib/libc-2.3.6.so]

63 < ???:__lock_get_internal (3x) [libdb-4.6.so]
45,024 < ???:__log_put (2144x) [libdb-4.6.so]
42 < ???:__dbreg_close_files (2x) [libdb-4.6.so]
21 < ???:__txn_preclose (1x) [libdb-4.6.so]
2,738,932,014 < ???:__memp_fput (130425334x) [libdb-4.6.so]
5,397 < ???:__dbreg_setup (257x) [libdb-4.6.so]
5,396,155,674 < ???:__db_cursor_int (256959794x) [libdb-4.6.so]
5,397 < ???:__db_refresh (257x) [libdb-4.6.so]
39,293,732 < ???:__memp_bhfree (1870942x) [libdb-4.6.so]
12,054 < ???:__mutex_free_int (574x) [libdb-4.6.so]
3,267,179,694 < ???:__lock_get (128480158x) [libdb-4.6.so]
21 < ???:__lock_open (1x) [libdb-4.6.so]
210 < ???:__memp_sync_int (10x) [libdb-4.6.so]
40,914,136 < ???:__memp_alloc (1947247x) [libdb-4.6.so]
21 < ???:__log_env_refresh (1x) [libdb-4.6.so]
5,397 < ???:__db_env_setup (257x) [libdb-4.6.so]
420 < ???:__txn_getckp (20x) [libdb-4.6.so]
3,570 < ???:__memp_walk_files (170x) [libdb-4.6.so]
9,681 < ???:__memp_fclose (461x) [libdb-4.6.so]
16,191 < ???:__lock_id_free (771x) [libdb-4.6.so]
5,397 < ???:__dbreg_close_id (257x) [libdb-4.6.so]
42 < ???:__log_vtruncate (2x) [libdb-4.6.so]
210 < ???:__memp_close_flush_files (10x) [libdb-4.6.so]
5,397 < ???:__dbreg_add_dbentry (257x) [libdb-4.6.so]
22,042,335 < ???:__mutex_alloc_int (1049635x) [libdb-4.6.so]
10,794 < ???:__dbc_destroy (514x) [libdb-4.6.so]
7,420 < ???:__lock_getlocker (257x) [libdb-4.6.so]
210 < ???:__txn_updateckp (10x) [libdb-4.6.so]
5,396,155,674 < ???:__dbc_close (256959794x) [libdb-4.6.so]
10,919 < ???:__txn_begin_int (514x) [libdb-4.6.so]
5,397 < ???:__dbreg_rem_dbentry (257x) [libdb-4.6.so]
5,397 < ???:__db_create_internal (257x) [libdb-4.6.so]
6,349 < ???:__lock_freefamilylocker (257x) [libdb-4.6.so]
5,397 < ???:__dbreg_teardown_int (257x) [libdb-4.6.so]
40,870,305 < ???:__memp_pgread (1946205x) [libdb-4.6.so]
273 < ???:__dbreg_revoke_id_int (13x) [libdb-4.6.so]
420 < ???:__txn_checkpoint (20x) [libdb-4.6.so]
21 < ???:__memp_open (1x) [libdb-4.6.so]
10,794 < ???:__txn_end (514x) [libdb-4.6.so]
210 < ???:__dbreg_log_files (10x) [libdb-4.6.so]
5,411 < ???:__lock_promote (3x) [libdb-4.6.so]
21 < ???:__mutex_open (1x) [libdb-4.6.so]
441 < ???:__log_current_lsn (21x) [libdb-4.6.so]
6,349 < ???:__lock_downgrade (257x) [libdb-4.6.so]
2,861,542,929 < ???:__memp_fget (136263949x) [libdb-4.6.so]
50,438,855 < ???:__lock_vec (1946718x) [libdb-4.6.so]
21,588 < ???:__memp_fopen (1028x) [libdb-4.6.so]
11,592 < ???:__os_openhandle (552x) [libdb-4.6.so]
5,397 < ???:__txn_dref_fname (257x) [libdb-4.6.so]
3,339 < ???:__memp_mf_discard (159x) [libdb-4.6.so]
5,376 < ???:__log_check_page_lsn (256x) [libdb-4.6.so]
210 < ???:__txn_getactive (10x) [libdb-4.6.so]
11,592 < ???:__os_closehandle (552x) [libdb-4.6.so]
20,020 < ???:__lock_id (771x) [libdb-4.6.so]
3,263,779,619 < ???:__lock_put (128479900x) [libdb-4.6.so]
21 < ???:__log_get_cached_ckp_lsn (1x) [libdb-4.6.so]
5,397 < ???:__db_close (257x) [libdb-4.6.so]
21 < ???:__env_ref_decrement (1x) [libdb-4.6.so]
399 < ???:__log_flush_int (19x) [libdb-4.6.so]
630 < ???:__logc_get_int (30x) [libdb-4.6.so]
5,397 < ???:__dbreg_new_id (257x) [libdb-4.6.so]
23,117,586,324 * ???:__db_tas_mutex_unlock [libdb-4.6.so]
1,096,400,545 > ???:__db_pthread_mutex_unlock (9598720x)
[libdb-4.6.so]
855 > ???:_dl_runtime_resolve (1x) [/lib/ld-2.3.6.so]

21,070,670,659 < ???:__db_ret (256958252x) [libdb-4.6.so]
208,933,932 < ???:__logc_get_int (1891922x) [libdb-4.6.so]
21,279,604,591 * ???:__db_retcopy [libdb-4.6.so]
96,799 > ???:__os_realloc (542x) [libdb-4.6.so]
5,230,793,756 > ???:memcpy (258850172x) [/lib/libc-2.3.6.so]
Rabbit
2007-08-31 04:06:55 UTC
Permalink
Post by Rabbit
I noticed a phenomena that, if I stop the program during the
initializaiton and start it again,
it will be very fast before it reaches where it's stopped last time.
After that, the db read becomes
slow again.
If I clean the evironment (just delete those __db* files), the db read
becomes slow very quickly.
I think this is because of the operating system's cache, because there
is no I/O during this "fast" period. I think it has no relation with
the environment after a few times of tests.
Rabbit
2007-08-31 09:12:55 UTC
Permalink
Very strange! I simply read through the data file (use std::ifstream)
before I retieve data through cursor, it becomes very fast! Seems it
depends on the file system's cache heavily!
Don Anderson
2007-09-01 03:31:33 UTC
Permalink
Hello Rabbit,

Some comments inline.
Post by Rabbit
Hi Don,
Thanks for your help! It's during the initialization period of my
program so there is no writing operations. I put detailed answers
below.
Post by Don Anderson
Hello,
I don't know valgrind either, but I know quite a bit about BDB. You
might want to ask
your question on the Oracle forums for BDB, in the meantime, here are
some thoughts.
You say 'After a while, the cursor read process becomes very slow'. I
guess that means
it is running fast until then? It sounds like you are doing all read
operations during this time?
It becomes slow very quickly, after reading couple of dbs.
Yes, all read operations.
Post by Don Anderson
If not, there may be some write operations holding locks - possible
contention. If you
are doing writing, are you using transactions?
No writing at this time.
Post by Don Anderson
Has your cache been prewarmed - that is, is there reason to believe
that the operations
before the slowdown are running fast because the cache is already
filled with pages
from previous accesses? A quick calculation shows that raw data size
(16+8) * 500,000 * 250 => 3G.
With typical overhead and page fill, the database sizes may be more
like 5 or 6 G (BDB
generally sacrifices space for speed) so if your testing is 'random'
and you are doing
complete database scans, it probably won't fit in 4G cache.
I think the cache is not prewarmed (how to prewarm it?).
DB cache is a simple cache of database pages, when the
cache fills up the least recently used page is kicked out.
Prewarming just means that you've accessed the database
in advance so the pages you need are already there.
Post by Rabbit
I noticed a phenomena that, if I stop the program during the
initializaiton and start it again,
it will be very fast before it reaches where it's stopped last time.
After that, the db read becomes
slow again.
If I clean the evironment (just delete those __db* files), the db read
becomes slow very quickly.
This all indicates that the performance differences are explained
by things being in cache or out of cache. Memory speeds are
a few orders of magnitude faster than disk speeds.
Post by Rabbit
The total physical files' size of these dbs is about 7.6G.
Yes I'm doing a complete database scan but I don't know whether it is
random.
I just put a DB_NEXT flag (sequence scan?) to the cursor and keep
reading until finished.
In my thought, the cache won't fit all the data, but is enough for one
db.
The BDB cache operates differently than you probably think.
It is not: here the guy is walking through a cursor, so I can predict
what the next access should be and prefetch the next page.
It is more like - the cursor visits all the data that is stored on
a page and when it is done, it must read the next page in the DB
file - that's a disk op that must wait until compilation.

One thing I have to ask - is this the typical operation for your
application,
reading sequentially through an entire database? Because we are
worried
about optimizing this, let's be sure this is the important use case.
What is the point for the scan? If you are searching for something,
you might want to set up a secondary table instead to find your your
data in a fraction of the time. Or if you need some tabulation (like
a
total of a column), you can maintain that as you are building the
data.
These things have costs of course, you just need to choose the
use cases that matter to you.

For optimizing this particular thing, you might try: using a BTREE
database if you aren't already. That alone probably won't fix
anything,
but you could then reload the database:
<shutdown the database environment and application>
db_dump foo.db | db_load foo2.db
mv foo2.db foo.db
<start the application>

That gets the order of the pages in the DB file to closely match
the logical order of the keys, and a sequential scan would
probably work best with most OS's which will do prefetching,
even if BDB isn't.
Post by Rabbit
Post by Don Anderson
If it isn't a cache issue, then maybe there's a resource that is
leaking. Are all cursors
closed when finished, and temporary memory freed?
Yes
Post by Don Anderson
You'd probably see process size
starting to balloon if that is what is going on. Perhaps there's some
other resource
contention between the threads. What happens if you run with fewer
(or even one) thread?
It's also very slow even I run with only one thread.
If I clean the environment, it's very fast when read the first 6 or 7
dbs, but becomes slow after that.
That is the data that doesn't mesh with a typical cache problem -
but if the OS is also caching than the OS would 'remember'
a previous run.

Another thing to try:
Use the DB_ENV->set_mp_mmapsize() API with a small number,
maybe 1 megabyte. This only makes a difference if you
are opening your databases readonly.

Also -- how big is the memory on your system? If it is not much
more than 4 GB, then you have made your BDB cache too large,
you are thrashing. Regardless, you might want to experiment with a
*smaller*
BDB cache to make sure.
Post by Rabbit
Post by Don Anderson
Answers to any of these could be helpful clues. Another thing to look
at is DB statistics.
The first one to look at is the output from db_stat -m, which would
tell you about the cache,
4GB Total cache size
4 Number of caches
4 Maximum number of caches
1GB Pool individual cache size
0 Maximum memory-mapped file size
0 Maximum open file descriptors
0 Maximum sequential buffer writes
0 Sleep after writing maximum sequential buffers
0 Requested pages mapped into the process' address space
24M Requested pages found in the cache (98%)
360522 Requested pages not found in the cache
0 Pages created in the cache
360521 Pages read into the cache
0 Pages written from the cache to the backing file
0 Clean pages forced from the cache
0 Dirty pages forced from the cache
0 Dirty pages written by trickle-sync thread
360510 Current total page count
360510 Current clean page count
0 Current dirty page count
524284 Number of hash buckets used for page location
24M Total number of times hash chains searched for a page
(24509250)
11 The longest hash chain searched for a page
28M Total number of hash chain entries checked for page (28559440)
0 The number of hash bucket locks that required waiting (0%)
0 The maximum number of times any hash bucket lock was waited
for (0%)
0 The number of region locks that required waiting (0%)
0 The number of buffers frozen
0 The number of buffers thawed
0 The number of frozen buffers freed
361545 The number of page allocations
0 The number of hash buckets examined during allocations
0 The maximum number of hash buckets examined for an allocation
0 The number of pages examined during allocations
0 The max number of pages examined for an allocation
0 Threads waited on page I/O
No big surprises. Your cache hit rate is high - 98%.
But because of the high volume, you still have 360522 misses,
these are all waiting on disk. To verify this is the issue,
perhaps do db_stat -m once every 10 seconds while you
are running your test, and correlate it to your response time.
The numbers in db_stat -m are cumulative, so you should
be able to see lots of cache misses exactly during the times
of the slowdown. With a little arithmetic, you can see what your
disk access time is. Perhaps there are good reasons it is slow -
is the disk on a network drive?
Post by Rabbit
Post by Don Anderson
and db_stat -Co, which can give some clues about contention. Run
these after the program
slows down. The output can be big, so capture it in a file.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock Region type
8 Region ID
data/__db.008 Region name
0x2affad7b6000 Original region address
0x2affad7b6000 Region address
0x2affad7b6068 Region primary address
0 Region maximum allocation
0 Region allocated
Region allocations: 8394757 allocations, 0 failures, 0 frees, 1
longest
REGION_JOIN_OK Region flags
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locker Mode Count Status ----------------- Object
---------------
17d READ 1 HELD #040.db handle 0
10b READ 1 HELD #002.db handle 0
114 READ 1 HELD #005.db handle 0
19c READ 1 HELD #050.db page 3987
126 READ 1 HELD #011.db handle 0
117 READ 1 HELD #006.db handle 0
165 READ 1 HELD #032.db handle 0
19c READ 1 HELD #050.db page 2584
10e READ 1 HELD #003.db handle 0
13e READ 1 HELD #019.db handle 0
171 READ 1 HELD #036.db handle 0
12f READ 1 HELD #014.db handle 0
15f READ 1 HELD #030.db handle 0
144 READ 1 HELD #021.db handle 0
186 READ 1 HELD #043.db handle 0
162 READ 1 HELD #031.db handle 0
12c READ 1 HELD #013.db handle 0
177 READ 1 HELD #038.db handle 0
174 READ 1 HELD #037.db handle 0
108 READ 1 HELD #001.db handle 0
105 READ 1 HELD #000.db handle 0
17a READ 1 HELD #039.db handle 0
13b READ 1 HELD #018.db handle 0
189 READ 1 HELD #044.db handle 0
102 READ 1 HELD sys.db handle
0
180 READ 1 HELD #041.db handle 0
156 READ 1 HELD #027.db handle 0
120 READ 1 HELD #009.db handle 0
159 READ 1 HELD #028.db handle 0
132 READ 1 HELD #015.db handle 0
138 READ 1 HELD #017.db handle 0
18f READ 1 HELD #046.db handle 0
123 READ 1 HELD #010.db handle 0
14d READ 1 HELD #024.db handle 0
19b READ 1 HELD #050.db handle 0
11d READ 1 HELD #008.db handle 0
168 READ 1 HELD #033.db handle 0
129 READ 1 HELD #012.db handle 0
198 READ 1 HELD #049.db handle 0
141 READ 1 HELD #020.db handle 0
135 READ 1 HELD #016.db handle 0
153 READ 1 HELD #026.db handle 0
147 READ 1 HELD #022.db handle 0
111 READ 1 HELD #004.db handle 0
150 READ 1 HELD #025.db handle 0
195 READ 1 HELD #048.db handle 0
11a READ 1 HELD #007.db handle 0
15c READ 1 HELD #029.db handle 0
192 READ 1 HELD #047.db handle 0
18c READ 1 HELD #045.db handle 0
16e READ 1 HELD #035.db handle 0
16b READ 1 HELD #034.db handle 0
14a READ 1 HELD #023.db handle 0
183 READ 1 HELD #042.db handle 0
Since everyone is a reader, there are no surprises here.
If you ever see WAIT instead of HELD, that can be an indication of
contention.

- Don
Rabbit
2007-09-03 01:50:51 UTC
Permalink
Hi Don,

Thank you very much again! ^_^
Post by Don Anderson
Hello Rabbit,
Some comments inline.
DB cache is a simple cache of database pages, when the
cache fills up the least recently used page is kicked out.
Prewarming just means that you've accessed the database
in advance so the pages you need are already there.
This all indicates that the performance differences are explained
by things being in cache or out of cache. Memory speeds are
a few orders of magnitude faster than disk speeds.
The BDB cache operates differently than you probably think.
It is not: here the guy is walking through a cursor, so I can predict
what the next access should be and prefetch the next page.
It is more like - the cursor visits all the data that is stored on
a page and when it is done, it must read the next page in the DB
file - that's a disk op that must wait until compilation.
One thing I have to ask - is this the typical operation for your
application,
reading sequentially through an entire database? Because we are
worried
about optimizing this, let's be sure this is the important use case.
What is the point for the scan? If you are searching for something,
you might want to set up a secondary table instead to find your your
data in a fraction of the time. Or if you need some tabulation (like
a
total of a column), you can maintain that as you are building the
data.
These things have costs of course, you just need to choose the
use cases that matter to you.
Reading sequentially through an entire database is not the typical
operation in my application. But it is necessary when the application
startup. I need read all the data into memory, process them (index
them in my own way), and provide very fast query service to the
client.
Post by Don Anderson
For optimizing this particular thing, you might try: using a BTREE
database if you aren't already. That alone probably won't fix
anything,
<shutdown the database environment and application>
db_dump foo.db | db_load foo2.db
mv foo2.db foo.db
<start the application>
That gets the order of the pages in the DB file to closely match
the logical order of the keys, and a sequential scan would
probably work best with most OS's which will do prefetching,
even if BDB isn't.
This method seems good. I also doubt that the DB's page is in random
order because of lots of UPDATE operation on it. I will try it.
Post by Don Anderson
That is the data that doesn't mesh with a typical cache problem -
but if the OS is also caching than the OS would 'remember'
a previous run.
Use the DB_ENV->set_mp_mmapsize() API with a small number,
maybe 1 megabyte. This only makes a difference if you
are opening your databases readonly.
Unfortunately, it's a read-write database.
Post by Don Anderson
Also -- how big is the memory on your system? If it is not much
more than 4 GB, then you have made your BDB cache too large,
you are thrashing. Regardless, you might want to experiment with a
*smaller*
BDB cache to make sure.
The memory is 16GB, I think it big enough for my application.
Post by Don Anderson
No big surprises. Your cache hit rate is high - 98%.
But because of the high volume, you still have 360522 misses,
these are all waiting on disk. To verify this is the issue,
perhaps do db_stat -m once every 10 seconds while you
are running your test, and correlate it to your response time.
The numbers in db_stat -m are cumulative, so you should
be able to see lots of cache misses exactly during the times
of the slowdown. With a little arithmetic, you can see what your
disk access time is. Perhaps there are good reasons it is slow -
is the disk on a network drive?
Loading...