Outline - Magma/CAL - Cluster Abstraction Library

Section One: Plugin Design considerations.

I. Required/recommended plugin functions
	A. All plugins must provide the following functions:
		1. s_member_list
		2. s_quorum_status
		3. s_get_event
		4. s_open
		5. s_close

	B. All plugins should provide the following functions:
		1. s_null
		2. s_plugin_version
		3. s_fence

	C. Plugins which support locking should provide the following
	   functions:
		1. s_lock
		2. s_unlock

	D. Plugins which support service groups or node groups should
	   provide the following functions:
		1. s_login
		2. s_logout

	E. All plugins must provide the following dl-mappable functions
	   (these are checked by cp_load() using dlsym()):
		1. cluster_plugin_load
		2. cluster_plugin_init
		3. cluster_plugin_unload
		4. cluster_plugin_version
			i. This can be inherited by placing:
			   "IMPORT_PLUGIN_API_VERSION()" near the top
			   of your source code.

	F. Plugins which do not implement a given function should leave
	   them alone; cp_load() maps all functions in a given
	   cluster_plugin_t object to be "unimplemented" versions, so
	   there is no danger of dereferencing NULL function pointers
	   when cp_load() is used.

	G. When the caller calls cp_unload on your object, it must be
	   fully cleaned up, logged out, and have all resources cleaned
	   up without further intervention of the caller.

II. Plugin coding style.
	A. Please use Linux-Kernel Coding Style.

III. Plugin Design Considerations.
	A. Plugins should be designed as re-entrant entities
		1. This means no internal pthread_* locking calls
		2. They should use private data structures.
			i. Avoids need for internally-global symbols
			ii. Prevents need for internal locking, making
			   each object internally thread-safe.


Section Two: What magma provides for you

I. High-level cluster connection functions
	A. clu_connect(char *group, int login)
		1. Connects to underlying cluster infrastructure and
		   logs in to the group specified by <group>(if
		   <login> is nonzero)
		2. Loads/unloads plugins and tries to log in until one
		   succeeds.
		3. Returns a select(2)able file descriptor on which you
		   may look for events.

	B. clu_disconnect(int fd)
		1. Disconnects from underlying infrastructure and logs
		   out.

	C. clu_member_list(char *group)
		1. Returns a "cluster_member_list_t *" structure (see
		   src/magma.h) containing members of the given <group>.
		2. Does not require being logged in to <group>.

	D. clu_quorum_status(char *group)
		1. Returns a set of flags showing quorum & group state
		   for <group>
			i. QF_QUORATE -> Node is quorate
			ii. QF_GROUPMEMBER -> Node is member of
			  specified group.
		2. Does not require being logged in to <group>.

	E. clu_fence(cluster_member_t *node) [NEW]
		1. Fences or marks <node> expired.
		2. User then waits for an event to be returned from the
		   cluster; this function is asynchronous.

	F. clu_get_event(int fd)
		1. Retrieve an event from <fd>
		2. Returns one of:
			i. CE_NULL - Ignore this event.
			ii. CE_MEMB_CHANGE - Membership change
			iii. CE_SUSPEND - pause until next
			   CE_MEMB_CHANGE event is received
			iv. CE_QUORATE - Cluster quorum formed
			v. CE_INQUORATE - Cluster quorum dissolved
			vi. CE_SHUTDOWN - Local node is leaving the
			   cluster.  Clean up and exit immediately.
			

II. Local node identification information
	A. clu_local_nodename(char *group, char *buf, int buflen)
		1. Copies the local node name from membership list
		   of <group> into <buf>
		2. <buf> is pre-allocated of size <buflen>.

	B. clu_local_nodeid(char *group, uint64_t *nodeid)
		1. Copies the local node ID from membership list of
		   <group> into <nodeid>
		2. <nodeid> is a pre-allocated 64-bit integer.

III. Membership list manipulation and querying
	A. clu_members_gained(cluster_member_list_t *old,
			      cluster_member_list_t *new)
		1. Returns list of members which are now present or
		   otherwise online in <new> which were not present or
		   were offline in <old>.
		2. User must call cml_free() to free returned structure

	B. clu_members_lost(cluster_member_list_t *old,
			   cluster_member_list_t *new)
		1. Returns list of members which are no longer present
		   or are otherwise offline in <new> which were present
		   or were online in <old>.
		2. User must call cml_free() to free returned structure

	C. memb_online(cluster_member_list_t *nodes, uint64_t nodeid)
		1. Returns 1 if <nodeid> is present and/or online in
		   <nodes>.

	D. memb_name_to_id(cluster_member_list_t *nodes, char *name)
		1. Returns 64-bit node ID of the member named <name>.

	E. memb_name_to_p(cluster_member_list_t *nodes, char *name)
		1. Returns (cluster_member_t *) structure to node
		   in <nodes> named <name>.

	F. memb_id_to_name(cluster_member_list_t *nodes, uint64_t
			   nodeid)
		1. Returns (char *) corresponding to node ID <nodeid>
		   if it exists in <nodes>.

	G. memb_id_to_p(cluster_member_list_t *nodes, uint64_t nodeid)
		1. Returns (cluster_member_t *) structure to node
		   in <nodes> with id <nodeid>.

	H. memb_resolve(cluster_member_t *member)
		1. Resolves using getaddrinfo(); stores result in
		   member->cm_addrs.

	I. memb_resolve_list(cluster_member_list_t *new,
			     cluster_member_list_t *old)
		1. Moves all existing cm_addrs pointers from <old>
		   to <new>.
		2. Resolves any previously unresolved nodes in
		   <old> and stores them in <new>.

	J. cml_free(cluster_member_list_t *dead)
		1. Cleans up all cm_addrs fields (using freeaddrinfo)
		2. Frees <dead> structure.
		3. ALWAYS use this to free cluster_member_list_t
		   structures

	K. print_member_list(cluster_member_list_t *list, int verbose
		1. Dumps all member names, IDs, and their addresses
		   to stdout.

IV. IPv4 / IPv6 abstracted TCP messaging functions
	A. msg_update(cluster_member_list_t *new_membership)
		1. Frees previously allocated/updated membership list
		   replaces with new one.
		2. Resolves all members using memb_resolve_list.
		3. Should be called whenever you receive a new
		   membership list from the cluster infrastructure.

	B. msg_listen(uint16_t baseport, int *fds, int fds_len)
		1. Sets up listening sockets on baseport (IPv4)
		   and baseport+IPV6_PORT_OFFSET (IPv6)
		2. Stores file descriptors in <fds>.
		3. Returns number of listening sockets.

	C. msg_open(uint64_t nodeid, uint16_t baseport)
		1. Connects to <nodeid> via <baseport>.  First tries
		   IPv6; falls back to IPv4.

	D. msg_send(int fd, void *buf, size_t buflen)
		1. Sends <buf> to <fd>; size = <buflen>
		2. <buf> is prepended with a count and a 32-bit CRC
		   which aids in detection of lost data.

	E. msg_receive(int fd, void *buf, size_t buflen)
		1. Receives <buflen> bytes from <fd>; stores in <buf>
		2. Contents returned are checked against a 32-bit CRC
		   for basic integrity

	F. msg_receive_timeout(int fd, void *buf, size_t buflen,
			       int timeout)
		1. Receives <buflen> bytes from <fd>; stores in <buf>
		2. Aborts after <timeout> seconds.
		3. Contents returned are checked against a 32-bit CRC
		   for basic integrity

	G. msg_close(int fd)
		1. Closes <fd>

V. Cluster Locking API
	A. clu_lock(char *resource, int flags, void **lockpp)
		1. Obtains a cluster-wide lock for <resource>
		2. <*(lockpp)> is allocated; a plugin-specific
		   lock handle is stored inside.
		3. The same lock handle should be passed back to
		   clu_unlock()
		4. Lock flags
			i. CLK_NOWAIT - return EAGAIN if unavailable
			ii. CLK_WRITE - Write lock
			iii. CLK_READ - Read lock
			iv. CLK_EX - Exclusive lock

	B. clu_unlock(char *resource, void *lockp)
		1. Releases cluster-wide lock for <resource>
		2. <lockp> is freed.


Section Three: Notes on included plugins

I. GuLM - The Grand Unified Lock Manager (gulm.so)
	A. This plugin does not have a notion of node groups.
	B. The locking system does not support multiple process-locks
	   on the same node.  To get around this, we take a POSIX lock
	   on a file prior to asking for a lock from GuLM.

II. CMAN - Using DLM to sync after transition (cman.so)
	A. This plugin does not have a notion of node groups.

III. CMAN - Using Kernel Service Manager (sm.so)
	A. This plugin uses two different methods to query node group
	   information.
	B. This is the only plugin which returns the CE_SUSPEND event.
	   Applications intending to use it should also be able to
	   operate without it.

