Contact the author.
HOWTO build and install OpenDLM (V0.01) Authors: Ben Cahill Stanley Wang This document describes how to build, install, and configure OpenDLM. It provides a simple example configuration for a 2-node cluster. Within this document, we'll try to provide the basics of getting started, without the need to study the various component projects before setting up OpenDLM. However, you *should* study the projects sometime! Recommended reading: OpenDLM: WHATIS-opendlm, dlmbook_final.pdf (programmer's guide) HA heartbeat: Getting Started, faqntips SOFTWARE COMPONENTS: -------------------- *OpenDLM* is a Distributed Lock Manager. It has no single point of failure. OpenDLM distributes both lock management *and* lock storage among all of the computer nodes in the cluster. If one of the nodes crashes, recovery of relevant lock state is possible by the surviving nodes. OpenDLM exposes a native programming interface (described in the Programmer's Guide) in both user space and kernel space, and also a Service Availability Forum (SAF) Locking service interface in user space. OpenDLM requires cluster membership services, to tell OpenDLM which member nodes are active, and when a member node fails. Two possibilities will be described in this document, both of which are part of the linux-ha project: http://www.linux-ha.org/heartbeat/ *Heartbeat* has been the cluster manager of choice for a few years. Just recently, *CCM* has been added to the linux-ha project. CCM is more advanced than Heartbeat, and uses a consensus/voting algorithm to maintain current membership status. OpenDLM accesses CCM via the SA Forum Membership interface, exposed by CCM. Since this is a standard interface, OpenDLM should actually be able to work with any implementation of the SAF membership interface (but we haven't tried any except CCM). We will use Heartbeat or CCM in only a very basic way, relative to their full sets of capabilities. For example, we will not ask them to failover any resources (e.g. IP addresses). We will also use only one communication device (eth0) for both Heartbeat and OpenDLM communication. However, you may wish to have comm redundancy (e.g. serial port or separate ethernet card) in your final setup. See additional information at the Heartbeat website. *Libnet* is a networking library used by Heartbeat. BUILDING AND INSTALLING OPENDLM ------------------------------- The following instructions should cover all types of Linux distributions, since they describe how to download source code tarballs and build from scratch. For best results, we recommend following this download/build procedure on each machine in the cluster (rather than building on a single build machine, then installing in the cluster machines). Alternatively, some of the projects have RPMs available. 1. Get libnet, HA heartbeat, and OpenDLM source code from: A. http://www.packetfactory.net/libnet We've used version 1.1.1 and the latest, 184.108.40.206, successfully. tar -xvzf tarball into a separate build directory. B. http://www.linux-ha.org/heartbeat/ We've used 1.0.4 successfully, but recommend version 1.2.0 or later, which has a number of improvements. Build and config instructions in steps below are oriented toward 1.2.x, and are a bit different than for 1.0.4. If you want to use the CCM membership service, you *must* use 1.2.1 or later. tar -xvzf tarball into a separate build directory, or use CVS: export CVSROOT=:pserver:firstname.lastname@example.org:/home/cvs/linux-ha cvs login (use "guest" as the password) cvs co linux-ha C. http://opendlm.sourceforge.net You should use the *CVS* version of OpenDLM, since currently there is ongoing work to stabilize OpenDLM. export CVSROOT=:pserver:email@example.com:/cvsroot/opendlm cvs login (just hit "enter" key for the password) cvs -z3 co opendlm (-z3 invokes compression, if desired) HINT: If you become a developer, don't try to modify/check-in code from within this anonymous tree on your computer. CVS will refuse your attempt as "anonymous", saying that you don't have permission, even if you have changed your CVSROOT to :ext:firstname.lastname@example.org/cvsroot/opendlm. 2. Build and install libnet: cd /your/path/to/libnet/ ./configure (no options required) make # make install (root privilege required) Check for success: /usr/lib contains libnet.a /usr/include/libnet contains several .h files 3. Build and install HA heartbeat: cd /your/path/to/heartbeat-[version]/ ./ConfigureMe make (invokes ./bootstrap, ./configure, make) # make install (root privileges required) HINT: If you fail in the "make" stage of ./ConfigureMe, when "Making all in libltdl", try removing the libltdl subdirectory, and then repeat ./ConfigureMe make. The libltdl subdirectory is not present in the CVS version, but *is* in the tarball version. It has several automatically generated files that have given us problems. Check for success: /usr/lib contains "heartbeat" subdirectory containing a number of files 4. Build and install OpenDLM: IMPORTANT: Use CVS code base! (see step 1.C. above). IMPORTANT: You should verify the value of "MAXLOCKVAL" (the size of a Lock Value Block) in /your/path/to/opendlm/src/include/dlm.h. Make sure that it is 32, required to hold all of the data OpenGFS places in the LVBs. Current CVS is 32, so you should be okay as-is. Older versions were 16, and are not compatible with 32. NOTE: OpenDLM requires access to kernel source when building. If you will be using OpenDLM with a kernel *other* than the running kernel (type "uname -r" to see running kernel), for example if you've prepared a specially patched kernel for use with, say, OpenGFS, but you're running a different kernel right now, use the following option with the ./configure command below: --with-linux-srcdir=/your/path/to/linux-[version] cd /your/path/to/opendlm/ ./bootstrap ./configure (with options below) (for heartbeat membership management (default): ) --with-heartbeat_includes=/your/path/to/heartbeat-[ver]/include (OR, for CCM membership management: ) --with-ccm --with-ccm_includes=/your/path/to/heartbeat-[ver]/include make # make install (root privileges required) Check for success: /lib/modules/[version]/dlm contains cccp.o, among others. 5. Configure HA Heartbeat: Heartbeat requires 3 configuration files (identical in each node) for proper operation. Root privileges are required for creating these. A. Create an ha.cf config file. The "node" lines, below, contain "name*", which are placeholders for the names of the cluster member computers. Type the following command on each cluster member to determine its name: # uname -n The following file must appear as: /etc/ha.d/ha.cf logfacility syslog # use syslog for log/debug output bcast eth0 # use eth0 for heartbeat communication auto_failback on # avoids a warning, even though we're not using failover node name1 # uname -n of node 1 node name2 # uname -n of node 2 apiauth ccm gid=root uid=root # this and following lines authorize apiauth heartbeat gid=root uid=root # certain programs to use heartbeat API apiauth default gid=root uid=root # For our example, we use default values (no entry) for all other parameters. See heartbeat's source tree doc/ha.cf for more info. You might want to set things up differently than our example. B. Create an haresources file. It tells heartbeat which resources (e.g. applications, IP addresses, etc.) to failover when a node fails. In our case, we are not using the failover feature, but the file is required anyway. Just create an empty file (or a file with one line return), appearing as: /etc/ha.d/haresources C. Create an authentication keys file. We'll assume that you have a secure network, so we'll use the computationally cheapest method, crc. The following file must appear as: /etc/ha.d/authkeys auth 1 1 crc After creating, change its privileges to 600: # chmod 600 authkeys ----------- NOTE: All 3 files should be same in the two cluster nodes. NOTE: We have not set up a stomith ("Shoot The Other Machine in the Head") method for this installation. For a clustered filesystem, stomith is vital for protecting the shared data from getting clobbered by a wayward node. It's also an important tool for high availability, to make sure a wayward node reboots fairly quickly. See the "Getting Started With Linux-HA (heartbeat)" document for information on stomith, other parameters and their default values in the configuration files, and the much more extensive capabilities of heartbeat, at: http://www.linux-ha.org/download/GettingStarted.html 6. Configure OpenDLM: OpenDLM requires one configuration file, and an edit of modules.conf. As with heartbeat, root privileges are required for all of this. A. Create the configuration file: As in step 11A, the "name*" placeholders are for the uname -n names of the member nodes. In this case, the IP address is required. The following file must appear as: /etc/dlm.conf NODECOUNT 2 1 name1 192.168.0.37 2 name2 192.168.0.203 DLMNAME haDLM DLMMAJOR 250 DLMCMGR heartbeat (or ccm) DLMADMIN admin 0 DLMLOCKS locks 1 NOTE: Make sure that DLMCMGR indicates the (cluster) membership manager that you intend to use, either heartbeat or ccm. NOTE: dlm.conf should be same in the two nodes, and make sure that "locks" is not "lock"! NOTE FOR USERS OF OPENGFS: The order of nodes is meaningful. Node 1 will use OpenGFS journal 0, Node 2 will use journal 1, etc. If you are switching from using memexp, order should be the same as in OpenGFS's config file for the cluster information device (see OpenGFS' HOWTO-nopool). This will maintain the same journal assignments that you had set up via the cidev. B. Modify /etc/modules.conf to include the following line, to point to the dlmdk.core module when trying to load "haDLM": alias haDLM dlmdk.core Now, before doing the following, make sure that you are running the kernel for which OpenDLM was built (see section 4 above). Then, to update the module dependency file /lib/modules/*/modules.dep, execute: # depmod -a HINT: If you're not running the kernel for which OpenDLM was built, this step will modify the wrong dependency file. 7. Start locking service: Make sure that you are running the kernel for which OpenDLM was built (see section 4 above). You'll need root privileges for all of the following steps: A. Start HA heartbeat on each computer: On name1: # /etc/init.d/heartbeat start On name2: # /etc/init.d/heartbeat start Check for success: Command line response indicates success. /var/log/messages shows success. HINT: You can use other files/facilities as your log output. See in the heartbeat source tree doc/ha.cf. HINT: heartbeat currently seems to have problems with NPTL Posix threads library. e.g. If you are using RedHat RHEL3 or 9, and having problems with "PID mismatch", try disabling NPTL by: # export LD_ASSUME_KERNEL=2.4 Then restart heartbeat via: # /etc/init.d/heartbeat stop # /etc/init.d/heartbeat start B. If you're using CCM, start CCM on each computer, to run in the background (&): On name1: # /usr/lib/heartbeat/ccm & On name2: # /usr/lib/heartbeat/ccm & C. Start OpenDLM on each computer: On name1: # /usr/local/sbin/dlmdu -C /etc/dlm.conf On name2: # /usr/local/sbin/dlmdu -C /etc/dlm.conf Check for success: Command line response indicates success. File /proc/cccp and directory /proc/haDLM exist. Check for success: Try using a test app within OpenDLM source tree: src/user/tests/simpleclient If you have a problem, double check your /etc/dlm.conf files, e.g. "locks" (not "lock"). HINT: A recurring message "Condition timeout, ..." just means that the cluster configuration has not changed recently. Nothing to worry about. D. Insert the Opendlm kernel module (root privileges required): On name1: # modprobe libdlmk On name2: # modprobe libdlmk Check for success: cat /proc/modules shows "libdlmk" among others HINT: If libdlmk fails to install, you may not have started OpenDLM successfully. See step B above. That's it, you are done! You should now be able to use OpenDLM as the lock manager for user-space applications, as well as kernel-space entities such as the OpenGFS filesystem. SHUTTING DOWN CLEANLY --------------------- 1. Stop OpenDLM and HA heartbeat: # killall dlmdu # /etc/init.d/heartbeat stop (this also kills ccm) 2. Unload the modules: # modprobe -r libdlmk # modprobe -r dlmdk.core STARTING OpenDLM (e.g. after boot-up) ------------------------------------ Once OpenDLM has been installed on your computers, only a few steps are needed to get it going after a boot-up. You will need root privilege for all steps below: 1. Start heartbeat (on each computer). # /etc/init.d/heartbeat start 2. If using CCM, start ccm (on each computer). # linux-ha/membership/ccm/ccm & 3. Start OpenDLM (on each computer, after all nodes' heartbeats are started). # /usr/local/sbin/dlmdu -C /etc/dlm.conf 4. Load OpenDLM kernel modules (on each computer). # modprobe libdlmk Check for success: cat /proc/modules shows "libdlmk" among others Copyright 2002-2004 The OpenGFS Project Portions copyright 2004 The OpenDLM Project