Contact the author.
OpenDLM Glossary (definitions of terms and acronyms)
Copyright 2004 The OpenDLM Project
Author:
Ben Cahill (bc), ben.m.cahill@intel.com
barrier: a cluster-wide lock recovery (dlm_recover.c) state that must be
reached by all nodes in the cluster before any nodes can move to the next
recovery state. (See "recovery", below).
cccp_: "clever cluster communications protocol", the inter-node comm service
built into OpenDLM.
clm_: "cluster lock manager"
clmr_: "cluster lock manager reconfiguration"
cluster: several computers working together, communicating over a local area
network of some sort.
convert: change the state of a pre-existing lock (directory entry and master
already exist somewhere, lock record already exists on *this* node).
(See "create", below).
create: create a new lock. The first node to create a lock on a resource
becomes the resource's master node, and the directory node creates a new
directory entry for the resource. (See "convert", above).
cti: client transaction interface (clm_cti.c). (see pti, below).
directory: a node that knows the lock masters for a given set of lockable
resources. The directory node for a given resource is determined by a
hash formula, based on the resource name and the number of active nodes
in the cluster. When cluster membership changes, some directory entries
may need to migrate from one node to another.
distributed lock manager (DLM): an inter-node lock management system in which
all elements are distributed among cluster nodes, so that the crash of a
node will not cause the entire cluster to crash. That is, there is no
single-point-of-failure (SPOF) in a true DLM.
grace period: time during which the cluster is in a recovery state
hsm_, HSM: "hierarchical state machine". ODLM uses a hierarchical state tree
for controlling the state of cluster membership and lock state recovery for
a given node. Some states, known interchangeably as "parent", "super",
or "ancestor" states, have sub-states. When moving from one state to
another, the state machine may need to traverse up the tree from the
"current" state, then down another branch to reach the state known
interchangeably as the "destination", "target", or "next" state. It
will do this via the "least common ancestor".
in-flight: a message is "in-flight" if has been sent by one node, but the
receiving node has not sent back a response message after processing
the request. Note that the request message may have been successfully
transmitted to the receiving node, yet still be "in-flight" because it
hasn't been processed and acknowledged yet.
LCA: "least common ancestor". The lowest level node of the hierarchical
state tree that is common to both the current and the next state.
Used in heirarchical state machine (hsm.c) for lock recovery.
lock manager: Each node has a lock manager that does the real work of
creating/converting/deleting locks. The lock managers work together to
provide cluster-wide distributed lock management.
LVB: "lock value block". client-specific data that is attached to a resource,
and shared cluster-wide among the locks on the resource. The legacy
VMS-compatible LVB size is 16 bytes, but OpenGFS requires 32 bytes.
master: a node that knows the status of all locks throughout the cluster
for a given resource. Same as "primary".
migration: the process of moving resource directory or lock state information
from one node to another.
node: a computer which is a member of a cluster. Also, can mean an entry
in the wait-for graph (see TWFG, below).
primary: a master node for a given resource. The primary copy of the resource,
held within the primary node, knows about all locks on the resource
throughout the cluster. See "master", "secondary", "resource".
pti: primary (i.e. master node) transaction interface (clm_pti.c).
(See "cti", above).
purge:
rc_: "recovery" (see below)
recovery: the process of (re-)establishing a stable cluster-wide cluster
membership model, and a stable cluster-wide distribution of lock state
and lock mastership directory information after a node crashes.
resource: a lockable entity. It is identified by name and type (UNIX vs VMS),
and is represented by a struct resource. Each node that knows about the
resource keeps a copy of the resource structure. The "primary" or "master"
node for the resource keeps track of all nodes' locks on the resource,
but a "secondary" or "non-master" node knows only about its own locks.
The structure supports 3 queues, grant/convert/wait for locks on the
resource, as well as a lock value block shared by all locks on the resource.
SCN, scn_: "System Commit Number", a counter used to ??
secondary: A non-master node for a given resource. A secondary copy of the
resource needs to know only about the locks on its own node, rather than
across the whole cluster. See "primary" and "resource".
TWFG: the wait-for graph, used in deadlock detection (clm_deadlock.c).
udp: