Contact the author.
OpenDLM Glossary (definitions of terms and acronyms) Copyright 2004 The OpenDLM Project Author: Ben Cahill (bc), ben.m.cahill@intel.com barrier: a cluster-wide lock recovery (dlm_recover.c) state that must be reached by all nodes in the cluster before any nodes can move to the next recovery state. (See "recovery", below). cccp_: "clever cluster communications protocol", the inter-node comm service built into OpenDLM. clm_: "cluster lock manager" clmr_: "cluster lock manager reconfiguration" cluster: several computers working together, communicating over a local area network of some sort. convert: change the state of a pre-existing lock (directory entry and master already exist somewhere, lock record already exists on *this* node). (See "create", below). create: create a new lock. The first node to create a lock on a resource becomes the resource's master node, and the directory node creates a new directory entry for the resource. (See "convert", above). cti: client transaction interface (clm_cti.c). (see pti, below). directory: a node that knows the lock masters for a given set of lockable resources. The directory node for a given resource is determined by a hash formula, based on the resource name and the number of active nodes in the cluster. When cluster membership changes, some directory entries may need to migrate from one node to another. distributed lock manager (DLM): an inter-node lock management system in which all elements are distributed among cluster nodes, so that the crash of a node will not cause the entire cluster to crash. That is, there is no single-point-of-failure (SPOF) in a true DLM. grace period: time during which the cluster is in a recovery state hsm_, HSM: "hierarchical state machine". ODLM uses a hierarchical state tree for controlling the state of cluster membership and lock state recovery for a given node. Some states, known interchangeably as "parent", "super", or "ancestor" states, have sub-states. When moving from one state to another, the state machine may need to traverse up the tree from the "current" state, then down another branch to reach the state known interchangeably as the "destination", "target", or "next" state. It will do this via the "least common ancestor". in-flight: a message is "in-flight" if has been sent by one node, but the receiving node has not sent back a response message after processing the request. Note that the request message may have been successfully transmitted to the receiving node, yet still be "in-flight" because it hasn't been processed and acknowledged yet. LCA: "least common ancestor". The lowest level node of the hierarchical state tree that is common to both the current and the next state. Used in heirarchical state machine (hsm.c) for lock recovery. lock manager: Each node has a lock manager that does the real work of creating/converting/deleting locks. The lock managers work together to provide cluster-wide distributed lock management. LVB: "lock value block". client-specific data that is attached to a resource, and shared cluster-wide among the locks on the resource. The legacy VMS-compatible LVB size is 16 bytes, but OpenGFS requires 32 bytes. master: a node that knows the status of all locks throughout the cluster for a given resource. Same as "primary". migration: the process of moving resource directory or lock state information from one node to another. node: a computer which is a member of a cluster. Also, can mean an entry in the wait-for graph (see TWFG, below). primary: a master node for a given resource. The primary copy of the resource, held within the primary node, knows about all locks on the resource throughout the cluster. See "master", "secondary", "resource". pti: primary (i.e. master node) transaction interface (clm_pti.c). (See "cti", above). purge: rc_: "recovery" (see below) recovery: the process of (re-)establishing a stable cluster-wide cluster membership model, and a stable cluster-wide distribution of lock state and lock mastership directory information after a node crashes. resource: a lockable entity. It is identified by name and type (UNIX vs VMS), and is represented by a struct resource. Each node that knows about the resource keeps a copy of the resource structure. The "primary" or "master" node for the resource keeps track of all nodes' locks on the resource, but a "secondary" or "non-master" node knows only about its own locks. The structure supports 3 queues, grant/convert/wait for locks on the resource, as well as a lock value block shared by all locks on the resource. SCN, scn_: "System Commit Number", a counter used to ?? secondary: A non-master node for a given resource. A secondary copy of the resource needs to know only about the locks on its own node, rather than across the whole cluster. See "primary" and "resource". TWFG: the wait-for graph, used in deadlock detection (clm_deadlock.c). udp: