dyndb - a database for a single writer and multiple readers. Uwe Ohse , 2000-02-06 Revision 2001-08-12: A record with a key- and datalength of zero has been subject to a deletion. dyndb limitations and features: * the database size is limited to 31 bits (2gigabytes). * keys and data are limited to 31 bits also. * hash, key length, data length and pointers are stored in little endian byte order. * although deletions of elements are possible there is no way to reclaim the disk space of the records: deleted data is unavailable, not freed. * a deleted record has a key- and data length of zero. * dyndb is limited to one writing process. There may be many readers at any time. An application may use file locking to get around that limitation. * The default table layout is targetted to offer nice performance for about 50000 to 100000 elements (note that your systems seek and read performance might limit the performance even more). Above that the performance will degrade drastically. * keys need not be unique. (the functions provided in the library will only return one record, even if the key may be found multiple times) A dyndb file contains at least one table (the "level 0" table), a number of records, and possibly a lot of additional tables on level 1 to 3. The level 0 table is at the start of the file, but no other order is guaranteed. A table contains a number of pointers to data records or tables. The level 0 table contains 4096 pointers (using 12 bit of the hash). The level 1 table contains 64 pointers (using 6 bit of the hash). The level 2 table contains 32 pointers (using 5 bit of the hash). The level 3 table contains 16 pointers (using 4 bit of the hash) It may be useful to increase the size of the level 0 and level 1 tables, see dyndb.h for how to do that (but note that dyndb has a horrible cache locality - do not expect it to perform too well with lots of keys). A few bits, different for each level, of the hash are used as index for the tables. A data record consists of a) a "next" pointer (the next element in a collision chain) of 4 bytes length. b) a hash of 4 bytes length. c) the length of the key, 4 bytes long. d) the length of the data, 4 bytes long. e) the key f) the data in that order. "Pointers" are unsigned 32bit integers. The highest bit has a special use, therefore a dyndb cannot contain more then approximately 2 gigabytes. If the highest bit is set then the next element is a table, otherwise the next element is another data record. If a collision chain would get a size of 4 or more elements then another table is created, containing all elements of the chain. This operation involves multiple seeks, writes and reads, and is therefore not atomically. The implementation is careful to not make any record unavailable for lookups in that time: (CC: collision chain, NP: next pointer, NT: new table, OT: old table) step 0: create the new table step 1: change the CCs last elements NP to point to the new table) step 2: go through the CC, insert pointers into NT, if the slot in the NT is empty. step 3: change OT to point to NT. step 4: go through the CC, and add any element not already contained handled in step 2 into a new collision chain. The last element of each new CC still points to it's old successor. step 5: go through the new CCs and change the lasts elements NP to point to NT. This rather slow procedure makes sure that any element cached already read by some other process still points to either the next element of the same hash (for the original table level), or points to the new table. The "seek" function has to follow the collision chain until it reaches a table it already saw before, or a "0" next pointer. The dyndb hash function is ``h = ((h << 5) + h) ^ c'', with a starting hash of 5381 (directly taken from cdb by djb@pobox.com). dyndb has an overhead of 16 bytes per record, plus 16384 bytes for the level 0 table, plus the space needed for any other table. Some samples (using key/records pairs of about 77 bytes length, a news message-id plus a cookie): 100: size_in= 7788, size_out= 26624, ratio=0.292 500: size_in= 39113, size_out= 63488, ratio=0.616 1000: size_in= 77602, size_out= 122880, ratio=0.631 2000: size_in= 156583, size_out= 212992, ratio=0.735 5000: size_in= 389352, size_out= 499712, ratio=0.779 10000: size_in= 777286, size_out=1064960, ratio=0.729 (from that point on many second level tables are needed) 20000: size_in=1548257, size_out=2449408, ratio=0.632 50000: size_in=3594971, size_out=5414912, ratio=0.663