User-space interface to DLM
---------------------------
#include <sys/types.h>
#include <sys/stat.h>
#include <stdint.h>
#include <libdlm.h>

cc -D_REENTRANT prog.c  -ldlm -lpthread

cc prog.c  -ldlm_lt


There are basically two interfaces to libdlm. The first is the "dead simple"
one that has limited functionality and assumes that the application is linked
with pthreads. The second is the full-featured DLM interface that looks
identical to the kernel interface.

See CVS dlm/tests/usertest for examples of use of both these APIs.

The simple one
--------------
This provides two API calls, lock_resource() and unlock_resource(). Both of
these calls block until the lock operation has completed - using a worker
thread to deal with the callbacks that come from the kernel.

int lock_resource(const char *resource, int mode, int flags, int *lockid);

  This function locks a named (NUL-terminated) resource and returns the
  lockid if successful. The mode may be any of

    LKM_NLMODE LKM_CRMODE LKM_CWMODE LKM_PRMODE LKM_PWMODE LKM_EXMODE

  Flags may be any combination of

    LKF_NOQUEUE - Don't wait if the lock cannot be granted immediately,
                  will return EAGAIN if this is so.

    LKF_CONVERT - Convert lock to new mode. *lockid must be valid,
                  resource name is ignored.

    LKF_QUECVT  - Add conversion to the back of the convert queue - only
                  valid for some convert operations

    LKF_PERSISTENT - Don't automatically unlock this lock when the process
                     exits (must be root).


int unlock_resource(int lockid);

  Unlocks the resource.



The complicated one
-------------------
This interface is identical to the kernel interface with the exception of
the lockspace argument. All userland locks sit in the same lockspace by default.

libdlm can be used in pthread or non-pthread applications. For pthread
applications simply call the following function before doing any lock
operations. If you're using pthreads, remember to define _REENTRANT at the
top of the program or using -D_REENTRANT on the compile line.

int dlm_pthread_init()

  Creates a thread to receive all lock ASTs. The AST callback function
  for lock operations will be called in the context of this thread.
  If there is a potential for local resource access conflicts you must
  provide your own pthread-based locking in the AST routine.


int dlm_pthread_cleanup()

  Cleans up the default lockspace threads after use. Normally you 
  don't need to call this, but if the locking code is in a 
  dynamically loadable shared library this will probably be necessary.


For non-pthread based applications the DLM provides a file descriptor
that the program can feed into poll/select. If activity is detected
on that FD then a dispatch function should be called:

int dlm_get_fd()

  Returns a file-descriptor for the DLM suitable for passing in to
  poll() or select().

int dlm_dispatch(int fd)

  Reads from the DLM and calls any AST routines that may be needed.
  This routine runs in the context of the caller so no extra locking
  is needed to protect local resources.


int dlm_lock(uint32_t mode,
	     struct dlm_lksb *lksb,
	     uint32_t flags,
	     const void *name,
	     unsigned int namelen,
	     uint32_t parent,
	     void (*astaddr) (void *astarg),
	     void *astarg,
	     void (*bastaddr) (void *astarg),
	     struct dlm_range *range);
	      

mode		lock mode:
                LKM_NLMODE 	NULL Lock
		LKM_CRMODE    	Concurrent read
		LKM_CWMODE    	Concurrent write
		LKM_PRMODE    	Protected read
		LKM_PWMODE    	Protected write
		LKM_EXMODE    	Exclusive

flags		LKF_NOQUEUE	Don't queue the lock. If it cannot be
				granted return -EAGAIN
		LKF_CONVERT	Convert an existing lock
		LKF_VALBLK	Lock has a value block
		LKF_QUECVT	Put conversion to the back of the queue
		LKF_EXPEDITE    Grant a NL lock immediately regardless of
				other locks on the conversion queue
		LKF_PERSISTENT  Specifies a lock that will
                                not be unlocked when the process exits.

lksb		Lock status block.
		This structure contains the returned lock ID, the actual
		status of the lock operation (all lock ops are asynchronous)
		and the value block if LKF_VALBLK is set.

name		Name of the lock. Can be binary, max 64 bytes. Ignored for lock
		conversions.

namelen		Length of the above name. Ignored for lock conversions.

parent		ID of parent lock or NULL if this is a top-level lock

ast		Address of AST routine to be called when the lock operation
		completes. The final completion status of the lock will be
		in the lksb. the AST routine must not be NULL.
		
astargs		Argument to pass to the AST routine (most people pass the lksb
		in here but it can be anything you like.)
		
bast		Blocking AST routine. address of a function to call if this 
		lock is blocking another. The function will be called with
		astargs. 

range		an optional structure of two uint64_t that indicate the range
		of the lock. Locks with overlapping ranges will be granted only
		if the lock modes are compatible. locks with non-overlapping
		ranges (on the same resource) do not conflict. A lock with no
		range is assumed to have a range emcompassing the largest
		possible range. ie. 0-0xFFFFFFFFFFFFFFFF.  Note that is is more
		efficient to specify no range than to specify the full range
		above.


dlm_lock operations are asynchronous. If the call to dlm_lock returns an error
then the operation has failed and the AST routine will not be called. If
dlm_lock returns 0 it is still possible that the lock operation will fail. The
AST routine will be called when the locking is complete or has failed and the
status is returned in the lksb.

For conversion operations the name and namelen are ignored and the lock ID in
the LKSB is used to identify the lock.  

If a lock value block is specified then in general, a grant or a conversion to 
an equal-level or higher-level lock mode reads the lock value from the resource 
into the caller's lock value block. When a lock conversion from EX or PW
to an equal-level or lower-level lock mode occurs, the contents of 
the caller's lock value block are written into the resource.

If the AST routines or parameter are passed to a conversion operation then they
will overwrite those values that were passed to a previous dlm_lock call.

int dlm_lock_wait(uint32_t mode,
		  struct dlm_lksb *lksb,
		  uint32_t flags,
		  const void *name,
		  unsigned int namelen,
		  uint32_t parent,
		  void *bastarg,
		  void (*bastaddr) (void *bastarg),
		  struct dlm_range *range);
	      

As above except that the call will block until the lock is
granted or has failed. The return from the function is
the final status of the lock request (ie that was returned
in the lksb after the AST routine was called).



int dlm_unlock(uint32_t lkid, 
               uint32_t flags, 
               struct dlm_lksb *lksb, 
               void *astarg)

lkid		Lock ID as returned in the lksb

flags		flags affecting the unlock operation:
		LKF_CANCEL	CANCEL a pending lock or conversion. 
				This returns the lock to it's
				previously granted mode (in case of a
				conversion) or unlocks it (in case of a
				waiting lock).
					
		LKF_IVVALBLK	Invalidate value block

lksb		LKSB to return status and value block information. 

astarg 		New parameter to be passed to the completion AST.
		The completion AST routine is the
		last completion AST routine specified in a dlm_lock call.
		If dlm_lock_wait() was the last routine to issue a lock, 
		dlm_unlock_wait() must be used to release the lock. If dlm_lock()
		was the last routine to issue a lock then either dlm_unlock()
		or dlm_unlock_wait() may be called.
		
Unlocks are also asynchronous. The AST routine is called when the resource is
successfully unlocked (see below).


Extra status returns to the completion AST (apart from those already
defined in errno.h)

ECANCEL
  A lock conversion was successfully cancelled
  
EUNLOCK
  An Unlock operation completed successfully

EDEADLOCK
  The lock operation is causing a deadlock and has been cancelled. If this
  was a conversion then the lock is reverted to its previously granted state.
  If it was a new lock then it has not been granted. 
  (NB Only conversion deadlocks are currently detected)
  
int dlm_unlock_wait(uint32_t lkid, 
                    uint32 flags, 
                    struct dlm_lksb *lksb)

As above but returns when the unlock operation is complete either because it
finished or because an error was detected. In the case where
the unlock operation was succesful no error is returned.

The return value of the function call is the status of issuing
the unlock operation. The status of the unlock operation itself
is in the lock status block. Both of these must be checked to
verify that the unlock has completed succesfully.

Lock Query Operations
---------------------
int dlm_query(struct dlm_lksb *lksb,
              int query,
	      struct dlm_queryinfo *qinfo,
	      void (*ast_routine(void *astarg)),
	      void *astarg);

The operation is asynchronous, the ultimate status and data will be returned into the
dlm_query_info structure which should be checked when the ast_routine is
called. The lksb must contain a valid lock ID in sb_lkid which is used to 
identify the resource to be queried, status will be returned in sb_status;
As with the locking calls an AST woutine will be called when the query completes
if the call itself returns 0.

int dlm_query_wait(struct dlm_lksb *lksb,
                   int query,
	           struct dlm_queryinfo *qinfo)

Same as dlm_query() except that it waits for the operation to complete.
When the operation is complete the status of will be in the lksb. Both
the return value from the function call and the condition code in the
lksb must be evaluated.

If the provided lock list is too short to hold all the locks, then sb_status
in the lksb will contain -E2BIG but the list will be filled in as far as possible.
Either gqi_lockinfo or gqi_resinfo may be NULL if that information is not required.

/* Structures passed into and out of the query */

struct gdlm_lockinfo
{
    int   lki_lkid;          /* Lock ID on originating node */
    int   lki_parent;
    int   lki_node;          /* Originating node (not master) */
    int   lki_ownpid;        /* Owner pid on the originating node */
    uint8 lki_state;         /* Queue the lock is on */
    int8  lki_grmode         /* Granted mode */
    int8  lki_rqmode;        /* Requested mode */
    struct dlm_range lki_grrange /* Granted range, if applicable */
    struct dlm_range lki_rqrange /* Requested range, if applicable */
};

struct gdlm_resinfo
{
    int	 rsi_length;
    int  rsi_grantcount; /* No. of nodes on grant queue */
    int  rsi_convcount;  /* No. of nodes on convert queue */
    int  rsi_waitcount;  /* No. of nodes on wait queue */
    int  rsi_masternode; /* Master for this resource */
    char rsi_name[DLM_RESNAME_MAXLEN]; /* Resource name */
    char rsi_valblk[DLM_LVB_LEN];      /* Master's LVB contents, if applicable */
};

struct dlm_queryinfo
{
    struct dlm_resinfo  *gqi_resinfo;   /* Points to a single resinfo struct */
    struct dlm_lockinfo *gqi_lockinfo;  /* This points to an array of structs */
    int                   gqi_locksize;  /* input */
    int                   gqi_lockcount; /* output */
};

The query is made up of several blocks of bits as follows:

                   9  8        6  5     3           0
+----------------+---+-------+---+-------+-----------+
|    reserved    | Q | query | F | queue | lock mode |
+----------------+---+-------+---+-------+-----------+

lock mode is a normal DLM lock mode or DLM_LOCK_THIS
to use the mode of the lock in sb_lkid.

queue is a bitmap of
 DLM_QUERY_QUEUE_WAIT
 DLM_QUERY_QUEUE_CONVERT
 DLM_QUERY_QUEUE_GRANT

or one of the two shorthands:
 DLM_QUERY_QUEUE_GRANTED (for WAIT+GRANT)
 DLM_QUERY_QUEUE_ALL     (for all queues)

 F is a flag DLM_QUERY_LOCAL
which specifies that a remote access should not
happen. Only lock information that can
be gleaned from the local node will be returned so
be aware that it may not be complete.

The query is one of the following:
 DLM_QUERY_LOCKS_HIGHER
 DLM_QUERY_LOCKS_LOWER
 DLM_QUERY_LOCKS_EQUAL
 DLM_QUERY_LOCKS_BLOCKING
 DLM_QUERY_LOCKS_NOTBLOCK
 DLM_QUERY_LOCKS_ALL

which specifies which locks to look for by mode,
either the lockmode is lower, equal or higher
to the mode at the bottom of the query. 
DLM_QUERY_ALL will return all locks on the
resource.

DLM_QUERY_LOCKS_BLOCKING returns only locks
that are blocking the current lock. The lock
must not be waiting for grant or conversion
for this to be a valid query, the other flags
are ignored.

DLM_QUERY_LOCKS_NOTBLOCKING returns only locks
that are granted but NOT blocking the current lock.

Q specifies which lock queue to compare. By default
the granted queue is used. If the flags 
DLM_QUERY_RQMODE is set then the requested mode
will be used instead.


The "normal" way to call dlm_query is to put the
address of the dlm_queryinfo struct into 
lksb.sb_lvbptr and pass the lksb as the AST param,
that way all the information is available to you
in the AST routine.

Lockspace Operations
--------------------
The DLM allows locks to be partitioned into "lockspaces", and these can be
manipulated by userspace calls. It is possible (though not recommended) for
an application to have multiple lockspaces open at one time. 

All the above calls work on the "default" lockspace, which should be
fine for most users. The calls below allow you to isolate your
application from all others running in the cluster. Remember, lockspaces
are a cluster-wide resource, so if you create a lockspace called "myls" it 
will share locks with a lockspace called "myls" on all nodes.


dlm_lshandle_t dlm_create_lockspace(const char *name, mode_t mode);

  This creates a lockspace called <name> and the mode of the file 
  user to access it wil be <mode> (subject to umask as usual). 
  The lockspace must not already exist on this node, if it does -1
  will be returned and errno will be set to EEXIST. If you really
  want to use this lockspace you can then user dlm_open_lockspace()
  below. The name is the name of a misc device that will be created
  in /dev/misc.

  On success a handle to the lockspace is returned, which can be used 
  to pass into subsequent dlm_ls_lock/unlock calls. Make no assumptions
  as to the content of this handle as it's content may change in future.

  The caller must have CAP_SYSADMIN privileges to do this operation.


int dlm_release_lockspace(const char *name, dlm_lshandle_t lockspace, int force)

  Deletes a lockspace. If the lockspace still has active locks then -1 will be
  returned and errno set to EBUSY. Both the lockspace handle /and/ the name
  must be specified. This call also closes the lockspace and stops the thread
  associated with the lockspace, if any.

  Note that other nodes in the cluster may still have locks open on this 
  lockspace.
  This call only removes the lockspace from the current node.

  If the force flag is set then the lockspace will be removed even if another
  user on this node has active locks in it. Existing users will NOT
  be notified if you do this, so be careful.


dlm_lshandle_t dlm_open_lockspace(const char *name)

  Opens an already existing lockspace and returns a handle to it.


int dlm_close_lockspace(dlm_lshandle_t lockspace)

  Close the lockspace. Any locks held by this process will be freed.
  If a thread is associated with this lockspace then it will be stopped.


int dlm_ls_get_fd(dlm_lshandle_t lockspace)

  Returns the file descriptor associated with the lockspace so that the
  user call use it as input to poll/select.


int dlm_ls_pthread_init(dlm_lshandle_t lockspace)

  Initialise threaded environment for this lockspace, similar
  to dlm_pthread_init() above.


int dlm_ls_lock(dlm_lshandle_t lockspace,
		int mode,
		struct dlm_lksb *lksb,
		uint32_t flags,
		void *name,
		unsigned int namelen,
		uint32_t parent,
		void (*ast) (void *astarg),
		void *astarg,
		void (*bast) (void *astarg),
		struct dlm_range *range)

  Same as dlm_lock() above but takes a lockspace argument.

int dlm_ls_lock_wait(dlm_lshandle_t lockspace,
		     int mode,
		     struct dlm_lksb *lksb,
		     uint32_t flags,
		     void *name,
		     unsigned int namelen,
		     uint32_t parent,
		     void *bastarg,
		     void (*bast) (void *bastarg),
		     struct dlm_range *range)

  Same as dlm_lock_wait() above but takes a lockspace argument.


int dlm_ls_unlock(dlm_lshandle_t lockspace,
		  uint32_t lkid, 
                  uint32_t flags, 
                  struct dlm_lksb *lksb, 
                  void *astarg)


  Same as dlm_unlock above but takes a lockspace argument.

int dlm_ls_unlock_wait(dlm_lshandle_t lockspace,
		       uint32_t lkid, 
                       uint32_t flags, 
                       struct dlm_lksb *lksb)


  Same as dlm_unlock_wait above but takes a lockspace argument.


int dlm_ls_query(dlm_lshandle_t lockspace,
                 struct dlm_lksb *lksb,
                 int query,
	         struct dlm_queryinfo *qinfo,
	         void (*ast_routine(void *astarg)),
	         void *astarg);

  Same as dlm_query above but takes a lockspace argument.

int dlm_ls_query_wait(dlm_lshandle_t lockspace,
                      struct dlm_lksb *lksb,
                      int query,
	              struct dlm_queryinfo *qinfo)

  Same as dlm_query_wait above but takes a lockspace argument.


One further point about lockspace operations is that there is no locking
on the creating/destruction of lockspaces in the library so it is up to the
application to only call dlm_*_lockspace when it is sure that
no other locking operations are likely to be happening within that process.

Libraries
---------
There are two DLM libraries, one that uses pthreads (libdlm) to deliver ASTs
and a light one one that doesn't (libdlm_lt). 

The "light" library contains only the following calls.

- dlm_lock
- dlm_unlock
- dlm_query
- dlm_get_fd
- dlm_dispatch
- dlm_ls_lock
- dlm_ls_unlock
- dlm_ls_query
- dlm_ls_get_fd
- dlm_create_lockspace
- dlm_open_lockspace
- dlm_release_lockspace
- dlm_close_lockspace

Note that libdlm (the pthreads one) also contains the non-threaded calls
so you can choose at runtime if you need to.
