SYNOPSIS

#include ``strhash.h''

int strhash_create(strhash *table, unsigned int mod, unsigned int startsize, strhash_hashfunc);

void strhash_destroy(strhash *table);

uint32 strhash_hash(const char *key, unsigned int len);

int strhash_enter (strhash *table, int keyalloc, const char *key, uint32 keylen, int dataalloc, const char *data, uint32 datalen);

void strhash_lookupstart(strhash *table);

int strhash_lookupnext(strhash *table, const char *key, uint32 keylen, char **data, uint32 *datalen);

int strhash_lookup(strhash *table, const char *key, uint32 keylen, char **data, uint32 *datalen);

void strhash_delete(strhash *table);

int strhash_change(strhash *table, int dataalloc, const char *data, uint32 datalen);

void strhash_walkstart(strhash *table);

int strhash_walk(strhash *table, char **key, uint32 *keylen, char **data, uint32 *datalen);

#include ``strhash_io.h''

int strhash_load(strhash *table, buffer *b);

int strhash_save(strhash *table, buffer *b);


DESCRIPTION

A strhash is an associative array, mapping keys to values. Keys and values may be strings of up to 2 gigabytes (2^31) bytes of length, containing any possibly character. Keys need not be unique.


FUNCTIONS

strhash_create
creates a hash table. mod is the table split to use. startsize is the starting size of a second level table. hashfunc is a hashing function.

strhash_create returns -1 in case of an error.

strhash_destroy
destroys a hash table, freeing all memory used by it. The table must not be used for further lookups, but may be used as argument to strhash_create to create a new table. startsize is the starting slot size. hashfunc is a hashing function.

strhash_hash
is a hash function. It's neither very good nor very bad, but is quite fast. CRC based algorithms usually give a better key distribution.

Note: The strhash library uses the modulus of the hash and the table split factor (in C terms: hash%mod) to select a second level table, and uses the remainder of the hash (hash / mod) modulus the second level size as a index into the second level table.

strhash_enter
enters data into the table table. The key is described by key and keylen, the value by data and datalen.

If keyalloc is set then strhash_enter creates a copy of the key, otherwise strhash_enter creates a reference to the key. key and keylen may be NULL respectively 0 if keyalloc is not set (0).

If dataalloc is set then strhash_enter creates a copy of the value, otherwise strhash_enter creates a reference to the value. data and datalen may be NULL respectively 0 if dataalloc is not set (0).

strhash_enter returns -1 in case of trouble, setting errno appropriately. ENOENT means that either datalen or keylen exceeded 31 bits. ENOMEM means that strhash_enter ran out of memory.

strhash_enter returns 1 if can enter the new key.

strhash_lookupstart
starts a new search.

strhash_lookupnext
looks for the next record with key key and keylen in the table. It returns 1 if it finds the record; otherwise it returns 0. All calls to strhash_lookupnext must use the same key and keylen as the first call after strhash_lookupstart.

After strhash_lookupnext returned 0 no further lookup may be done without first calling strhash_lookupstart.

If data is not a NULL pointer then strhash_lookupnext will put a pointer to the value into *data. If datalen is not a NULL pointer then strhash_lookupnext will put a pointer to the values length into *datalen.

strhash_lookup
first calls strhash_lookupstart and then calls strhash_lookupnext.

strhash_delete
deletes the record found by the last strhash_lookupnext operation.

strhash_change
changes the value of the record found by the last strhash_lookupnext operation. It will create a copy of data if dataalloc is set, or store a reference to data otherwise.

strhash_walkstart
starts an operation to walk through the table.

strhash_walk
walks through the table, returning for every record in it. It returns 1 if a record is found, and 0 otherwise.

If 1 is returned then pointers to and length of key and value of the record are stored in the appropriate parameter, provided it's not a NULL pointer.

strhash_load
appends the content of the buffer to the hash. In case of trouble it sets errno and returns -1. EPROTO means that the buffers content was badly formatted.

strhash_save
saves the content of the hash to the buffer. In case of trouble it sets errno and returns -1.


LAYOUT

The top level hash table contains mod slots (where mod is an argument given to the strhash_create function). Each slot points to a second level table or to NULL if the second level table is still empty (unused).

Each second level table is allocated as soon as it is needed. It holds a number of pointers to entries. The library grows the second level tables so that they are never more than 50% of the pointers in use. The library doesn't reduce the size of a second level table.

Each entry holds the hash of the key, a pointer to the data, then length of the data, the length of the key and either the key or a pointer to a key.


MEMORY USAGE

In short: quite high. On a system where pointers a 32 bits long each record eats 20 bytes for the entry, between 8 and 12 bytes for the second level table, plus the size of the key and the value.

For short keys or data of less then a pointer size length the memory needed is somewhat lower.

Additional memory is used during table splits.

Note: the malloc implementation in the C standard library may waste additional memory.


LIMITS

keys are limited to 31 bits of length.
values are limited to 31 bits of length.
the order of records is not preserved.


SEE ALSO

how to integrate strhash into your projects: strhash-integration(3), the strhash homepage