The dcache library

libdcache - dcache library reference

DESCRIPTION

The dcache library offers a low-level interface to a fixed sized permanent storage based cache, providing functions to enter, delete and search records. The cache strategie is round-robin.

On systems supporting large files the cache size is virtually unlimited, on other systems it’s limited by the maximum file size. The total length of a key and the record is limited to 2^31 bytes (2 GB).

Multiple processes may use the same cache, provided that they are carefully using some kind of locking (the library provides locking functions).

OVERVIEW

dcache_create_fd or dcache_create_name create a cache.

Use dcache_init to initialize the cache file, to prepare it for all other operations, and dcache_free when you are finished using the cache, to free all resources associated to it by this process.

dcache_enter adds a new record to the cache, deleting old records if needed.

dcache_lookupstart starts a new search, and dcache_lookupnext continues a search for a key.

dcache_delete deletes the last record found by dcache_lookupnext.

dcache_walkstart and dcache_walk go through the whole cache content, allowing to read all keys and records from the cache.

dcache_lck_excl , dcache_lck_share , dcache_lck_tryexcl , dcache_lck_tryshare and dcache_lck_unlock allow to lock and unlock the cache. Using them is quite important, so be sure to read the section named LOCKING below.

dcache_reload_vars must be used to re-read the in-memory copies of certain cache variables after a lock has been aquired. The application also has to call dcache_store_vars to save these variables before a lock is released.

dcache_flush makes all pending modifications to the cache visible to other processes. dcache_sync forces the operating system to write all pending modifications to the disk.

dcache_read reads something from the cache. dcache_seek moves a file pointer. dcache_fd returns the file descriptor of the cache. dcache_datapos, dcache_datalen, dcache_keypos, dcache_keylen, reports the position of key and data and the expiration times found by dcache_lookupnext or dcache_walk.

LOCKING

Some kind of locking has to be used if more than one process accesses the same cache at the same time. The dcache_lck_excl , dcache_lck_share , dcache_lck_tryexcl , dcache_lck_tryshare and dcache_lck_unlock functions may be used for this. They are implemented using fcntl. The library internally never calls them, therefore you might replace them with other locking primitives or semaphores.

A process trying to modify a cache has to hold an exclusive lock, while a process trying to find something from the cache has to hold a shared lock. Well-behaved processes release the lock as soon as possible.

exclusive access means that other processes will not be allowed to do any operation on the database in the time this process holds the lock.

Automatic Synchronization

The library by default tries to minimize the damage done by a crash and calls dcache_sync if some record has been deleted during dcache_enter and after a record has been added. This costs considerable performance - fsync() isn’t cheap at all. If this is not needed you can use the dcache_set_autosync(cache,0) to stop this.

Data Types

The dcache data type

This structure describes the cache. Please treat it as opaque data type.

The dcache_uint32 data type

An unsigned integer of 32 bits width.

The dcache_reclen data type

An signed integer of 32 bits width. It holds then length of objects in the cache.

The dcache_pos data type

An signed integer of 64 bits width. It holds then position of objects in the cache.

The dcache_data data type

This structure describes a part of or the whole data to be inserted into the cache. It contains the following elements:

const char *p;
int fd; /* ignored if p != NULL */
dcache_reclen len;
struct dcache_data_struct *next;

p, if it is not NULL, is a pointer to a string. The length of the string is given by len. If p is NULL then fd has to be a valid file descriptor, from which len bytes will be written. next points to the next of a list of dcache_data structures.

The dcache_delete_callback data type

This function type can be used to inform the application if a record is going to be deleted during a dcache_enter process. To inform the library about the callback function use the dcache_set_delete_callback function.

The callback function has to return 0 if everything is OK or -1 in case of error. Additional return codes may be introduced later.
If the function returns an error then dcache_enter will return -1, but neither delete the old record nor add the new one.

The callback function receives a pointer to a dcache structure. It may use the dcache_keypos, dcache_keylen, dcache_datapos, dcache_datalen and dcache_fd functions to get information about the record to be deleted. Key and data may be read from the file descriptor returned by dcache_fd in this order.

TRANSACTIONS

dcache supports transactions since version 0.6.0.
A transaction is started by calling dcache_trans_start, it may be cancelled by dcache_trans_cancel, and it will be commited by a call to dcache_trans_commit. The later saves the changes, if any, into the replay log. To actually make the changes to the cache the program has to call dcache_trans_replay. No further transactions may be started before dcache_trans_replay has finished successfully.
dcache transactions require cooperation between the applications. If the replay could, for whatever reason, not be completed, then it has to be continued before any other modification to the cache is done. If you break this rule the cache will still be okay, from a purely technical point of view, but may contain unexpected records. Example: an application could, during the transaction, check whether a key exists in the cache, and enter it only if it doesn’t. If you then insert that key into your cache before you execute the replay, then your cache will contain a duplicate key after the replay.

Transaction Details

If a single enter operation of the transaction has been executed, but the process is unable to mark that operation as done, then this operation will be tried again at the time dcache_trans_replay is called again. Impact: this record will be duplicated.
Workaround: Don’t turn off the power at that point. Fixing this is not worth the pain of an additional fsync or two.
This is not a problem for the delete operation, as it ignores deleted records.
Any modification done inside a transaction will be visible after a successful call to dcache_trans_replay.
dcache_trans_replay may be repeated as often as you wish, in order to complete it.
Modification of the data of existing cache entries are not supported.
Calls to a callback function, if any, will be done during the replay.
It is good practice to check whether a log replay is needed soon after a cache is opened. Use the dcache_trans_need_replay function for this.
do not forget to call dcache_flush or dcache_sync after dcache_trans_commit or dcache_trans_replay
Uncommited changes are stored in memory.
Commited, but not replayed, changes take up disk space.

Transaction Locking

Things are simple if you aquire a exclusive lock before you call dcache_trans_start and release if after dcache_trans_replay.

Things get more interesting if you do not want to lock the cache exclusively for the whole transaction. You then need two locks: a read lock for the cache, to be held whenever you access the cache, and an exclusive lock for the log, which has to be held for the whole period of the transaction (from start to commit), the read lock may released whenever you do not access the cache.

FUNCTIONS

dcache_create_fd

int dcache_create_fd(int fd, dcache_pos maxsize,: dcache_uint32 elements, int do_hole);

dcache_create_fd creates a disk cache on fd. Any data in the will be will lost. maxsize is the maximum number of data bytes in the cache. elements is the size of the hash table.

Note: 25% of the space in the hash table will not be used, but left free for performance reasons. The cache is likely to offer better performance if only 50% of the hash table is used.

If do_hole is set then the data area of the file is created with lseek, thus not allocating the disk space at this time. Otherwise the hole maxsize bytes are written. The usage of holes may lead to run-time problems if the free disk space is low. Holes are a kind of overcommitment and unsafe.

dcache_create_fd returns fd or -1 in case of error.

dcache_create_fd refuses to create hash tables with less then 4 records, with too many records (the hash and pointer tables need to fit into 4 billion bytes, but your operating system is likely to enpose more rigid limitations), or caches with a data size of less than one record with a one byte key. It returns EINVAL in this case.

The function does not lock the cache in any way.

dcache_create_name

int dcache_create_name(const char *fname, int mode,: dcache_pos maxsize, dcache_uint32 elements,; int do_hole);

dcache_create_name works like dcache_create_fd.

The cache will be named fname. An already existing file fname will be overwritten. fname will be deleted in case of an error. mode specifies the access rights of a newly created file (see open(2)).

dcache_datalen

dcache_reclen dcache_datalen(dcache *)

This function, implemented as a macro, returns the length of the data of the record found by the latest successful call to dcache_lookupnext or dcache_walk.

dcache_datapos

dcache_pos dcache_datapos(dcache *)

This function, implemented as a macro, returns the position of the data of the record found by the latest successful call to dcache_lookupnext or dcache_walk. It may be used as argument to, for example, lseek. Note: this position is invalid after any change to the cache has been done.

dcache_datatime

dcache_time dcache_datatime(dcache *)

This function, implemented as a macro, returns the expiration value of the record found by the latest successful call to dcache_lookupnext or dcache_walk.

dcache_delete

int dcache_delete(dcache *cache);

dcache_delete deletes the record found by the last call to dcache_lookupnext. It MUST only be called if no change to the database has been done since the last call to dcache_lookupstart.

The function MUST be called with exclusive locking turned on. The lock MUST have been aquired before the call to dcache_lookupstart.

It’s not safe to kill a process during the execution of this function.

dcache_delete returns -1 in case of error and 0 otherwise. errno is set to EINVAL if some values in the cache file were invalid.

dcache_enter

int dcache_enter(dcache *cache, const void *key,: dcache_reclen keylen, dcache_data *data,; dcache_time timevalue);

dcache_enter enters a record into the cache, freeing old records if space or table slots are needed. key and keylen describe the key, and data is a list of dcache_data structures holding information about the record data.

Key, data and record overhead (16 bytes) together must not exceed 2 GB (31 bits) of size. Records with empty keys (0 bytes length) cannot be entered into the database. The key obviously has to fit into the memory.

If an old record is to be deleted a callback function call be called to inform the application. See dcache_delete_callback for more information.

The application must hold a exclusive lock on the cache when dcache_enter is called.

dcache_fd

int dcache_fd(dcache *c)

This function, implemented as a macro, returns the file descriptor associated with c.

dcache_flush

int dcache_flush(dcache *cache)

A call to dcache_flush makes all pending modifications to the cache visible to other processes. The function returns 0 if successful and -1 on error. It’s safe to unlock the cache afterwards.

Note that dcache_flush calls msync and is expensive.

dcache_free

int dcache_free(dcache *cache)

dcache_free frees all resources the library allocated for cache. It should be called after all pending changes, if any, have been sent to the storage medium by means of dcache_flushor dcache_sync.

dcache_init

int dcache_init(dcache *cache, int fd, int forwrite);

dcache_init initializes the cache structure with content from fd. fd has to be a valid file descriptor, opened for access matching forwrite. dcache_init will call dcache_reload_vars.

If forwrite is not 0 then the cache header and hash space be treated as writable.

dcache_init returns 0 in case of success or -1 in case of error. In the case of a corrupted cache header EINVAL is set.

dcache_init does not lock the cache. The application has to do this before the call to dcache_init.

dcache_keylen

dcache_reclen dcache_keylen(dcache *)

This function, implemented as a macro, returns the length of the key of the record found by the latest successful call to dcache_lookupnext or dcache_walk.

dcache_keypos

dcache_pos dcache_keypos(dcache *)

This function, implemented as a macro, returns the position of the key of the record found by the latest successful call to dcache_lookupnext or dcache_walk. It may be used as argument to, for example, lseek. Note: this position is invalid after any change to the cache has been done.

dcache_lck_excl

int dcache_lck_excl(int fd);

The functions locks the cache behind fd (a valid file descriptor) for exclusive access. It returns 0 on success and -1 in case of failure. If the lock cannot be aquired instantly then the function will wait.

This function takes a file descriptor as argument because it (or one of the other dcache_lck_* functions) should be called before dcache_init is invoked.

dcache_lck_share

int dcache_lck_share(int fd);

The functions locks the cache behind fd (a valid file descriptor) for shared access. It returns 0 on success and -1 in case of failure. If the lock cannot be aquired instantly then the function will wait.

shared access means that other processes will be allowed to do read operations on the database in the time this process holds the lock.

dcache_lck_tryexcl

int dcache_lck_tryexcl(int fd);

dcache_lck_tryshare

int dcache_lck_tryshare(int fd);

dcache_lck_unlock

int dcache_lck_unlock(int fd);

The function released the lock on the cache behind fd. It returns 0 on success and -1 in case of failure.

dcache_lookupstart

void dcache_lookupstart(dcache *cache);

This function prepares the library for a new lookup and has to be called before dcache_lookupnext.

cache has to be initialized before and has to be locked.

dcache_lookupnext

int dcache_lookupnext(dcache *cache,const char *key,: dcache_reclen keylen);

dcache_lookupnext looks for the next record under key key in the cache. next is the nth record where n is the number of calls to dcache_lookupnext after the latest call to dcache_lookupstart.

The function return 1 of the record is found, 0 if the record is not found and -1 in case of error.

All calls of dcache_lookupnext must use the same key and keylen as all other calls do dcache_lookupnext since the latest call to dcache_lookupstart.

If the function returns 1 then the record my be read from the file descriptor associated with the cache (see dcache_fd). dcache_datalen returns the length of the data, and dcache_datatime returns it’s expiration time. dcache_datapos returns the position of the record in the cache, if you need that for seeking back.

dcache_lookupnext must be called while a lock is hold. It must not be called if a change to the database has been done after the latest call to dcache_lookupstart.

If you need to read the data found by dcache_lookupnext then you have to do this before the lock is released.

dcache_lookup

int dcache_lookup(dcache *cache,const char *key,: dcache_reclen keylen);

This is implemented as a call to dcache_lookupstart followed by a call to dcache_lookupnext. It finds the first record matching key. See above for a description of the locking requirements.

dcache_read

dcache_reclen dcache_read(int fd,: void *buf,dcache_reclen len);

dcache_read reads len bytes from the file descriptor fd. It returns -1 in case of error and len otherwise. If less than len bytes are available -1 is returned with errno set to EIO (I/O Error).

While the function is called a lock on the cache behind fd has to be held.

dcache_reload_vars

int dcache_reload_vars(dcache *cache);

dcache_reload_vars loads a number of variables describing the state of the cache from the shared memory area. It has to be called after a lock has been aquired and before any other action on the cache is done.

The function returns -1 in case of error (EINVAL means that the cache is corrupted) or 0 if succesful.

dcache_seek

dcache_pos dcache_seek(dcache *cache, dcache_pos xpos)

dcache_seek moved the file pointer of the file descriptor fd to pos. It returns -1 on error and 0 if successful. Note that this function does not report the old position. Use dcache_datapos or dcache_keypos for thi.s

Warning: Any use of dcache_seek to position the file descriptor at a position outside the range between the position returned by the last call to dcache_datapos or dcache_keypos and the end of the record is illegal.

This function is useful if you need to read key and/or data multiple times. Don’t use it otherwise.

This function must be called while the cache is locked, and must not be called if any modification to the cache has been done after the last call to dcache_lookupnext or dcache_walk.

dcache_set_autosync

void dcache_set_autosync(dcache *cache, int onoff)

Causes the library to automatically synchronize changes to the disk. If onoff is zero then no automatic synchronization will be done, any other value means the opposize.

Automatic synchronization means that, if an old record is deleted during during dcache_enter, dcache_flush will be called to flush any changes to the disk. This reduces the probability that a crash / shutdown during the process of entering the new data to the cache damages the database.

Note that still a crash at the totally wrong time can still eat the cache, this is just far more unlikely to happen.

This function may be called even if no lock is held.

dcache_set_delete_callback

void dcache_set_delete_callback(dcache *cache,: dcache_delete_callback cb)

When dcache_enter needs to delete a record it can call a callback function. The type of the callback function is dcache_delete_callback. More information may be found at the description of the type.

To delete an existing callback you can set the callback to a NULL pointer through another call to dcache_set_delete_callback.
When using transactions the callback function will be called during the log replay.

dcache_store_vars

void dcache_store_vars(dcache *cache);

This function stores the working copies of the variables in the cache header area to the shared memory area. It has to be called while an exclusive lock on the cache is held. A call to dcache_flush makes sure that the changes are flushed to the disk and visible to other processes.

dcache_sync

int dcache_sync(dcache *cache);

A call to dcache_sync causes all pending modifications to the cache cache to be written to the storage medium. The function will not return before this is finished. It will return 0 if successful and -1 on error.

It’s safe to unlock the cache afterwards. Obviously this function has to be called while a lock is held.

Note that dcache_sync calls fsync and msync and is generally very expensive.

dcache_trans_cancel

int dcache_trans_cancel(dcache *cache);

Cancels the pending changes: The will not be written into the transaction log. It return -1 in case of error, and 0 otherwise. The only possible error is that there is no open transaction.

dcache_trans_commit

int dcache_trans_commit(dcache *cache);

Commits the pending changes into the transaction log. It return -1 in case of error, and 0 otherwise.

dcache_trans_commit

int dcache_need_replay(dcache *cache);

Returns 1 if there is a committed, but not already replayed transaction, 0 if the is no such thing, and -1 in case of error.

dcache_trans_start

int dcache_trans_start(dcache *cache);

Starts a transaction. New records entered afterwards will not be visible until the transaction has been committed and replayed. Deletions will not be visible up to that point, too. It return -1 in case of error, and 0 otherwise.

dcache_trans_replay

int dcache_trans_replay(dcache *cache);

Takes committed changes, if any, and executes them. It returns -1 in case of error, and 0 if all changes have been done.

The function changes the cache. It is not safe to continue dcache_lookupstart, dcache_lookup, or dcache_lookupnext after dcache_trans_replay has been called.

dcache_walkstart

void dcache_walkstart(dcache *cache);

This function prepares the library for a walk through all elements of a cache.

cache has to be initialized and locked.

dcache_walk

int dcache_walk(dcache *cache, int *fd,: dcache_reclen *keylen, dcache_reclen *datalen,; dcache_time *expire);

dcache_walk fills fd, keylen, datalen and expire with information describing the next element in cache.

The function returns -1 if an error has occured. A return code of zero means there are no further records in the cache. One (1) means that the function successfully filled the variables with information about the record.

You may read the key from fd. It is keylen bytes long. Then, after you read the key, you can read the data from fd. It is datalen bytes long. expire is filled the the expiration value.

Any or all of the pointer variables may be NULL pointers. You may only read the key if at least fd and keylen are no NULL pointers, and you may only read the record data of fd, keylen and datalen are no NULL pointers.

dcache_walk may only be called if dcache_walkstart has been called before and if neither a change to the cache has happened since then nor the lock to the database has been released.

Note: You MUST NOT call dcache_delete after a call to dcache_walk. dcache_delete must only be used after a successful call to dcache_lookupnext.

EXAMPLES

Transaction Example

dcache cache;

int fd=open("somefile.dcache",O_RDWR);
if (-1==fd) die();
if (-1==dcache_init(&cache,fd,1)) die();
if (-1==dcache_lck_excl(fd)) die();
switch(dcache_need_replay(&cache)) {
case 0: break;
case -1: die();
default:
  if (-1==dcache_trans_replay(&cache)) die();
  if (-1==dcache_flush(&cache)) die();
}
if (-1==dcache_trans_start(&cache)) die();
/* this functions enters and deletes cache entries */
if (enters_and_delete_ok(&cache)) {
    if (-1==dcache_trans_commit(&cache)) die(); /* lose */
    if (-1==dcache_trans_replay(&cache)) die();
    if (-1==dcache_flush(&cache)) die();
} else {
    if (-1==dcache_trans_cancel(&cache)) die();
}

You might want to use dcache_sync instead of dcache_flush depending on how valuable your data is.

COMPATIBILITY

64 bit

dcache needs what’s often called "large file support". System vendors often express their incompetence by not using a clean solution ("just use 64bit everywhere"), but by providing two sets of system calls, one for small and one for large files. One has to define some compile time constants to get support for large files. See http://www.ohse.de/uwe/articles/lfs.html for my opinion about this.

The dcache package deals with that.
The library doesn’t try even try to do any magic (whatever it tries could be completely wrong - depending on what your package does). If you want to use the dcache library you have to deal this this yourself. You might find lfs.sh in the dcache package "src" directory helpful. It basically generates four files, on contains a define (for informational purposes only), one a set of cflags, one a set of ldflags (typically empty) and one contains the names of libraries to link into the program (if you are feeling bad about this i can understand you - what did the POSIX people think while they made the C library guess CFLAGS for a compiler when it doesn’t even know which one is installed?).
The simple solution is to just put -D_LARGEFILE_SOURCE -DFILE_OFFSET_BITS=64 into your compiler invocation and forget about the rest. But in any case be careful to compile all your modules with or all without this flag.

Note that you can forget this entire section if your files do not exceed 2GB.

mmap

dcache needs a reasonable implementation of memory mapped files. This should be no problem today, but might lead to corrupted caches on older systems.