Discussion:
$str or md5($str) use as key?
(too old to reply)
p***@gmail.com
2008-01-18 14:23:20 UTC
Permalink
Few month ago I tested berkeley-db with various configurations (B-tree
or Hash, $str or md5($str) for key) and choose B-tree with md5($str)
for key.
But now tested again and get such result:
insert to emty DB 3041977 records
1. key - string whith ~72 chars ([A-Z0-9_-|]{1,72}).
200s - Btree
2000s - Hash
2. key - md5(string) 16 bytes
900s - Btree
1000s - Hash
3. key - md5_hex(string) 32 chars ([A-F0-9]{32}).
1000s - Btree
1200s - Hash

Why it's so?

Use very simple script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.8.8;
use BerkeleyDB;
use Benchmark::Timer;
use Digest::MD5 qw/md5_hex md5/;

my $module = "BerkeleyDB::$ARGV[2]";

my $bdbp = new $module -Filename => $ARGV[1], -Cachesize => 100000000,
-Flags => DB_CREATE or die "File '$ARGV[1]' has no BDB format\n";
open(FH,'<',$ARGV[0]) or die "Can't open input file: $ARGV[0]\n";
my $ST;

my $t = Benchmark::Timer->new();
$t->start('ALL');

while(<FH>) {
chomp();
my $UUID = uc($_);
# my $status = $bdbp->db_put(md5_hex($UUID),$UUID,DB_NOOVERWRITE);
# my $status = $bdbp->db_put(md5($UUID),$UUID,DB_NOOVERWRITE);
my $status = $bdbp->db_put($UUID,$UUID,DB_NOOVERWRITE);
}

close(FH);
undef $bdbp;

$t->stop('ALL');
print $t->report;
a***@gmail.com
2008-01-21 06:54:18 UTC
Permalink
Post by p***@gmail.com
Few month ago I tested berkeley-db with various configurations (B-tree
or Hash, $str or md5($str) for key) and choose B-tree with md5($str)
for key.
insert to emty DB 3041977 records
1. key - string whith ~72 chars ([A-Z0-9_-|]{1,72}).
200s - Btree
2000s - Hash
2. key - md5(string) 16 bytes
900s - Btree
1000s - Hash
3. key - md5_hex(string) 32 chars ([A-F0-9]{32}).
1000s - Btree
1200s - Hash
Why it's so?
#!/usr/bin/perl
use strict;
use warnings;
use 5.8.8;
use BerkeleyDB;
use Benchmark::Timer;
use Digest::MD5 qw/md5_hex md5/;
my $module = "BerkeleyDB::$ARGV[2]";
my $bdbp = new $module -Filename => $ARGV[1], -Cachesize => 100000000,
-Flags => DB_CREATE or die "File '$ARGV[1]' has no BDB format\n";
open(FH,'<',$ARGV[0]) or die "Can't open input file: $ARGV[0]\n";
my $ST;
my $t = Benchmark::Timer->new();
$t->start('ALL');
while(<FH>) {
chomp();
my $UUID = uc($_);
# my $status = $bdbp->db_put(md5_hex($UUID),$UUID,DB_NOOVERWRITE);
# my $status = $bdbp->db_put(md5($UUID),$UUID,DB_NOOVERWRITE);
my $status = $bdbp->db_put($UUID,$UUID,DB_NOOVERWRITE);
}
close(FH);
undef $bdbp;
$t->stop('ALL');
print $t->report;
seems hash method has a weaker performace than btree when data sets is
small, but i don't know how large the data sets have to be to make
hash a better choice.

Loading...