HOWTO_Build,_Install,_and_Configure_OpenDLM (Jun 16 2004)


Contact the author.

              HOWTO build and install OpenDLM (V0.01)

Authors:
Ben Cahill
Stanley Wang

This document describes how to build, install, and configure OpenDLM.
It provides a simple example configuration for a 2-node cluster.

Within this document, we'll try to provide the basics of getting started,
without the need to study the various component projects before setting up
OpenDLM.  However, you *should* study the projects sometime!  Recommended
reading:

OpenDLM:  WHATIS-opendlm, dlmbook_final.pdf (programmer's guide)
HA heartbeat:  Getting Started, faqntips


SOFTWARE COMPONENTS:
--------------------

*OpenDLM* is a Distributed Lock Manager.  It has no single point of failure.
OpenDLM distributes both lock management *and* lock storage among all of
the computer nodes in the cluster.  If one of the nodes crashes, recovery of
relevant lock state is possible by the surviving nodes.  OpenDLM exposes
a native programming interface (described in the Programmer's Guide) in both
user space and kernel space, and also a Service Availability Forum (SAF)
Locking service interface in user space.

OpenDLM requires cluster membership services, to tell OpenDLM which member
nodes are active, and when a member node fails.  Two possibilities will be
described in this document, both of which are part of the linux-ha project:

http://www.linux-ha.org/heartbeat/

*Heartbeat* has been the cluster manager of choice for a few years.  Just
recently, *CCM* has been added to the linux-ha project.  CCM is more advanced
than Heartbeat, and uses a consensus/voting algorithm to maintain current
membership status.

OpenDLM accesses CCM via the SA Forum Membership interface, exposed by CCM.
Since this is a standard interface, OpenDLM should actually be able to work
with any implementation of the SAF membership interface (but we haven't
tried any except CCM).

We will use Heartbeat or CCM in only a very basic way, relative to their
full sets of capabilities.  For example, we will not ask them to failover
any resources (e.g. IP addresses).  We will also use only one communication
device (eth0) for both Heartbeat and OpenDLM communication.  However, you may
wish to have comm redundancy (e.g. serial port or separate ethernet card) in
your final setup.  See additional information at the Heartbeat website.

*Libnet* is a networking library used by Heartbeat.


BUILDING AND INSTALLING OPENDLM
-------------------------------

The following instructions should cover all types of Linux distributions,
since they describe how to download source code tarballs and build from scratch.
For best results, we recommend following this download/build procedure on
each machine in the cluster (rather than building on a single build machine,
then installing in the cluster machines).  Alternatively, some of the projects
have RPMs available.


1. Get libnet, HA heartbeat, and OpenDLM source code from:

	A.  http://www.packetfactory.net/libnet

	We've used version 1.1.1 and the latest, 1.1.2.1, successfully.

	tar -xvzf tarball into a separate build directory.


	B.  http://www.linux-ha.org/heartbeat/

	We've used 1.0.4 successfully, but recommend version 1.2.0 or later,
	which has a number of improvements.  Build and config instructions
	in steps below are oriented toward 1.2.x, and are a bit different
	than for 1.0.4.  If you want to use the CCM membership service,
	you *must* use 1.2.1 or later.

	tar -xvzf tarball into a separate build directory, or use CVS:

	export CVSROOT=:pserver:guest@cvs.linux-ha.org:/home/cvs/linux-ha
	cvs login          (use "guest" as the password)
	cvs co linux-ha


	C.  http://opendlm.sourceforge.net

	You should use the *CVS* version of OpenDLM, since currently there is
	ongoing work to stabilize OpenDLM.

	export CVSROOT=:pserver:anonymous@cvs.sourceforge.net:/cvsroot/opendlm
	cvs login             (just hit "enter" key for the password)
	cvs -z3 co opendlm    (-z3 invokes compression, if desired)

	HINT:  If you become a developer, don't try to modify/check-in code
	from within this anonymous tree on your computer.  CVS will refuse
	your attempt as "anonymous", saying that you don't have permission,
	even if you have changed your CVSROOT to
	:ext:yourlogin@cvs.sourceforge.net/cvsroot/opendlm. 


2. Build and install libnet:

	cd /your/path/to/libnet/
	./configure     (no options required)
	make
	# make install  (root privilege required)

	Check for success:  /usr/lib contains libnet.a
		/usr/include/libnet contains several .h files


3. Build and install HA heartbeat:

	cd /your/path/to/heartbeat-[version]/
	./ConfigureMe make   (invokes ./bootstrap, ./configure, make)
	# make install       (root privileges required)

	HINT:  If you fail in the "make" stage of ./ConfigureMe, when
	"Making all in libltdl", try removing the libltdl subdirectory,
	and then repeat ./ConfigureMe make.  The libltdl subdirectory is not
	present in the CVS version, but *is* in the tarball version.  It has
	several automatically generated files that have given us problems.

	Check for success:  /usr/lib contains "heartbeat" subdirectory
			containing a number of files


4. Build and install OpenDLM:

	IMPORTANT:  Use CVS code base!  (see step 1.C. above).

	IMPORTANT:  You should verify the value of "MAXLOCKVAL" (the size
	of a Lock Value Block) in /your/path/to/opendlm/src/include/dlm.h.
	Make sure that it is 32, required to hold all of the data OpenGFS
	places in the LVBs.  Current CVS is 32, so you should be okay as-is.
	Older versions were 16, and are not compatible with 32.

	NOTE:  OpenDLM requires access to kernel source when building.  
	If you will be using OpenDLM with a kernel *other* than the running
	kernel (type "uname -r" to see running kernel), for example if you've
	prepared a specially patched kernel for use with, say, OpenGFS, but
	you're running a different kernel right now, use the following
	option with the ./configure command below:

	--with-linux-srcdir=/your/path/to/linux-[version]

	cd /your/path/to/opendlm/
	./bootstrap
	./configure	(with options below)
			(for heartbeat membership management (default): )
		--with-heartbeat_includes=/your/path/to/heartbeat-[ver]/include
			(OR, for CCM membership management: )
		--with-ccm
		--with-ccm_includes=/your/path/to/heartbeat-[ver]/include
	make
	# make install  (root privileges required)

	Check for success:  /lib/modules/[version]/dlm contains cccp.o,
		among others.


5.  Configure HA Heartbeat:
	Heartbeat requires 3 configuration files (identical in each node)
	for proper operation.  Root privileges are required for creating these.

	A.  Create an ha.cf config file.
	
	The "node" lines, below, contain "name*", which are placeholders for
	the names of the cluster member computers.  Type the following command
	on each cluster member to determine its name:

	# uname -n

	The following file must appear as:

	/etc/ha.d/ha.cf

logfacility syslog      # use syslog for log/debug output
bcast   eth0            # use eth0 for heartbeat communication
auto_failback on        # avoids a warning, even though we're not using failover
node   name1		# uname -n of node 1
node   name2		# uname -n of node 2
apiauth ccm gid=root uid=root       # this and following lines authorize
apiauth heartbeat gid=root uid=root #   certain programs to use heartbeat API
apiauth default gid=root uid=root   #

	For our example, we use default values (no entry) for all other
	parameters.  See heartbeat's source tree doc/ha.cf for more info.
	You might want to set things up differently than our example.


	B.  Create an haresources file.  It tells heartbeat which resources
	(e.g. applications, IP addresses, etc.) to failover when a node fails.
	In our case, we are not using the failover feature, but the file is
	required anyway.  Just create an empty file (or a file with one line
	return), appearing as:

	/etc/ha.d/haresources


	C.  Create an authentication keys file.  We'll assume that you have a
	secure network, so we'll use the computationally cheapest method, crc.
	The following file must appear as:

	/etc/ha.d/authkeys

auth 1
1 crc

	After creating, change its privileges to 600:

	# chmod 600 authkeys

	-----------

	NOTE: All 3 files should be same in the two cluster nodes.

	NOTE:  We have not set up a stomith ("Shoot The Other Machine in
	the Head") method for this installation.  For a clustered filesystem,
	stomith is vital for protecting the shared data from getting clobbered
	by a wayward node.  It's also an important tool for high availability,
	to make sure a wayward node reboots fairly quickly.  See the
	"Getting Started With Linux-HA (heartbeat)" document for information
	on stomith, other parameters and their default values in the
	configuration files, and the much more extensive capabilities of
	heartbeat, at:

	http://www.linux-ha.org/download/GettingStarted.html


6.  Configure OpenDLM:
	OpenDLM requires one configuration file, and an edit of modules.conf.
	As with heartbeat, root privileges are required for all of this.

	A.  Create the configuration file:

	As in step 11A, the "name*" placeholders are for the uname -n names of
	the member nodes.  In this case, the IP address is required.
	The following file must appear as:

	/etc/dlm.conf

NODECOUNT 2
1 name1 192.168.0.37
2 name2 192.168.0.203
DLMNAME haDLM
DLMMAJOR 250
DLMCMGR heartbeat (or ccm)
DLMADMIN admin 0
DLMLOCKS locks 1

	NOTE:  Make sure that DLMCMGR indicates the (cluster) membership 
	manager that you intend to use, either heartbeat or ccm.

	NOTE:  dlm.conf should be same in the two nodes, and make sure that
	"locks" is not "lock"!

	NOTE FOR USERS OF OPENGFS:  The order of nodes is meaningful.  Node 1
	will use OpenGFS journal 0, Node 2 will use journal 1, etc.  If you are
	switching from using memexp, order should be the same as in OpenGFS's
	config file for the cluster information device (see OpenGFS'
	HOWTO-nopool).  This will maintain the same journal assignments that
	you had set up via the cidev.


	B.  Modify /etc/modules.conf to include the following line, to point
	to the dlmdk.core module when trying to load "haDLM":

alias haDLM dlmdk.core

	Now, before doing the following, make sure that you are running the
	kernel for which OpenDLM was built (see section 4 above).

	Then, to update the module dependency file /lib/modules/*/modules.dep,
	execute:

	# depmod -a

	HINT:  If you're not running the kernel for which OpenDLM was built,
	this step will modify the wrong dependency file.


7.  Start locking service:

	Make sure that you are running the kernel for which OpenDLM was built
	(see section 4 above).

	You'll need root privileges for all of the following steps:


	A.  Start HA heartbeat on each computer:

	On name1:   # /etc/init.d/heartbeat start
	On name2:   # /etc/init.d/heartbeat start

	Check for success:  Command line response indicates success.
		/var/log/messages shows success.

	HINT:  You can use other files/facilities as your log output.  See 
	in the heartbeat source tree doc/ha.cf.

	HINT:  heartbeat currently seems to have problems with NPTL Posix
	threads library.  e.g. If you are using RedHat RHEL3 or 9, and having
	problems with "PID mismatch", try disabling NPTL by:
		# export LD_ASSUME_KERNEL=2.4

		Then restart heartbeat via:
		# /etc/init.d/heartbeat stop
		# /etc/init.d/heartbeat start


	B.  If you're using CCM, start CCM on each computer, to run in
	the background (&):

	On name1:   # /usr/lib/heartbeat/ccm &
	On name2:   # /usr/lib/heartbeat/ccm &


	C.  Start OpenDLM on each computer:

	On name1:   # /usr/local/sbin/dlmdu -C /etc/dlm.conf
	On name2:   # /usr/local/sbin/dlmdu -C /etc/dlm.conf

	Check for success:  Command line response indicates success.
		File /proc/cccp and directory /proc/haDLM exist.

	Check for success:  Try using a test app within OpenDLM source tree:
		src/user/tests/simpleclient
		If you have a problem, double check your /etc/dlm.conf files,
		e.g. "locks" (not "lock").

	HINT:  A recurring message "Condition timeout, ..." just means that
	the cluster configuration has not changed recently.  Nothing to
	worry about.


	D. Insert the Opendlm kernel module (root privileges required):

	On name1:   # modprobe libdlmk
	On name2:   # modprobe libdlmk

	Check for success:  cat /proc/modules shows "libdlmk" among others

	HINT:  If libdlmk fails to install, you may not have started OpenDLM
	successfully.  See step B above.


That's it, you are done!
You should now be able to use OpenDLM as the lock manager for user-space
applications, as well as kernel-space entities such as the OpenGFS filesystem.


SHUTTING DOWN CLEANLY
---------------------

1. Stop OpenDLM and HA heartbeat:
	# killall dlmdu
	# /etc/init.d/heartbeat stop  (this also kills ccm)

2. Unload the modules:
	# modprobe -r libdlmk
	# modprobe -r dlmdk.core


STARTING OpenDLM (e.g. after boot-up)
------------------------------------
Once OpenDLM has been installed on your computers, only a few steps are needed
to get it going after a boot-up.  You will need root privilege for all steps
below:

1.  Start heartbeat (on each computer).

	# /etc/init.d/heartbeat start

2.  If using CCM, start ccm (on each computer).

	# linux-ha/membership/ccm/ccm &

3.  Start OpenDLM (on each computer, after all nodes' heartbeats are started).

	# /usr/local/sbin/dlmdu -C /etc/dlm.conf

4.  Load OpenDLM kernel modules (on each computer).

	# modprobe libdlmk

	Check for success:  cat /proc/modules shows "libdlmk" among others


Copyright 2002-2004 The OpenGFS Project
Portions copyright 2004 The OpenDLM Project
SourceForge Logo