Hello Domino: 06/24/10

Thursday, June 24, 2010

How failover and Load balancing works in Domino Cluster

How failover works
A cluster's ability to redirect requests from one server to another is called failover. When a user tries to access a database on a server that is unavailable or in heavy use, Domino directs the user to a replica of the database on another server in the cluster.

The Cluster Manager on each cluster server sends out probes to each of the other cluster servers to determine the availability of each server. The Cluster Manager also checks continually to see which replicas are available on each server. When a user tries to access a database that is not available, the user request is redirected to a replica of the database on a different server in the cluster. Although the user connects to a replica on a different server, failover is essentially transparent to the user.

Example
This example describes the process that Domino uses when it fails over. This cluster contains three servers. Server 1 is currently unavailable. The Cluster Managers on Server 2 and Server 3 are aware that Server 1 is unavailable.

Failover in a cluster

1. A Notes user attempts to open a database on Server 1.

2. Notes realizes that Server 1 is not responding.

3. Instead of displaying a message that says the server is not responding, Notes looks in its cluster cache to see if this server is a member of a cluster and to find the names of the other servers in the cluster. (When a Notes client first accesses a server in a cluster, the names of all the servers in the cluster are added to the cluster cache on the client. This cache is updated every 15 minutes.)

4. Notes accesses the Cluster Manager on the next server listed in the cluster cache.

5. The Cluster Manager looks in the Cluster Database Directory to find which servers in the cluster contain a replica of the desired database.

6. The Cluster Manager looks in its server cluster cache to find the availability of each server that contains a replica. (The server cluster cache contains information about all the servers in the cluster. Cluster servers obtain this information when they send probes to the other cluster servers.)

7. The Cluster Manager creates a list of the servers in the cluster that contain a replica of the database, sorts the list in order of availability, and sends the list to Notes.

8. Notes opens the replica on the first server in the list (the most available server). If that server is no longer available, Notes opens the replica on the next server in the list. In this example, Server 2 was the most available server.

When the Notes client shuts down, it stores the contents of the cluster cache in the file CLUSTER.NCF. Each time the client starts, it populates the cluster cache from the information in CLUSTER.NCF.

How workload balancing works
By distributing databases throughout the cluster, you balance the workload in the cluster so that no server is overloaded. In addition, there are several NOTES.INI variables you can set to help balance the workload. For example, you can specify a limit on how busy a server can get by specifying an availability threshold. When the server reaches the availability threshold, the Cluster Manager marks the server BUSY. When a server is BUSY, requests to open databases are sent to other servers that contain replicas of the requested databases. You can also specify the maximum number of users you want to access a server. When the server reaches this limit, users are redirected to another server. This keeps the workload balanced and keeps the server working at optimum performance.

When a user tries to open a database on a BUSY server, the Cluster Manager looks in the Cluster Database Directory for a replica of that database. It then checks the availability of the servers that contain a replica and redirects the user to the most available server. If no other cluster server contains a replica or if all cluster servers are BUSY, the original database opens, even though the server is BUSY.

Example
This example describes how Domino performs workload balancing. This cluster contains three servers. Server 2 is currently BUSY because the workload has reached the availability threshold that the administrator set for this server. The Cluster Managers on Server 1 and Server 3 are aware that Server 2 is BUSY.

Workload balancing in a cluster

1. A Notes user attempts to open a database on Server 2.

2. Domino sends Notes a message that the server is BUSY.

3. Notes looks in its cluster cache to find the names of the other servers in the cluster.

4. Notes accesses the Cluster Manager on the next server listed in the cluster cache.

5. The Cluster Manager looks in the Cluster Database Directory to find which servers in the cluster contain a replica of the desired database.

6. The Cluster Manager looks in its server cluster cache to find the availability of each server that contains a replica.

7. The Cluster Manager creates a list of the servers in the cluster that contain a replica of the database, sorts the list in order of availability, and sends the list to Notes.

8. Notes opens the replica on the first server in the list (the most available server). If that server is no longer available, Notes opens the replica on the next server in the list.

The cluster components

There are several components that work together to make clustering function correctly. These include:
• The Cluster Manager
• The Cluster Database Directory
• The Cluster Database Directory Manager
• The Cluster Administrator
• The Cluster Replicator
• The Internet Cluster Manager

These components are described in the following sections, except the Internet Cluster Manager, which is described in the section "Clustering Domino Servers that Run Internet Protocols."
The Cluster Manager
A Cluster Manager runs on each server in a cluster and tracks the state of all the other servers in the cluster. It keeps a list of which servers in the cluster are currently available and maintains information about the workload on each server.
When you add a server to a cluster, Domino automatically starts the Cluster Manager on that server. As long as the server is part of a cluster, the Cluster Manager starts each time you start the server.
Each Cluster Manager monitors the cluster by exchanging messages, called probes, with the other servers in the cluster. Through these probes, the Cluster Manager determines the workload and availability of the other cluster servers. When it is necessary to redirect a user request to a different replica, the Cluster Manager looks in the Cluster Database Directory to determine which cluster servers contain a replica of the requested database. The Cluster Manager then informs the client which servers contain a replica and the availability of those servers. This lets the client redirect the request to the most available server that contains a replica.
The tasks of the Cluster Manager include:
• Determining which servers belong to the cluster. It does this by periodically monitoring the Domino Directory for changes to the ClusterName field in the Server document and the cluster membership list.
• Monitoring server availability and workload in the cluster.
• Informing other Cluster Managers of changes in server availability.
• Informing clients about available replicas and availability of cluster servers so the clients can redirect database requests based on the availability of cluster servers (failover).
• Balancing server workloads in the cluster based on the availability of cluster servers.
• Logging failover and workload balance events in the server log file.

When it starts, the Cluster Manager checks the Domino Directory to determine which servers belong to the cluster. It maintains this information in memory in the server's Cluster Name Cache. The Cluster Manager uses this information to exchange probes with other Cluster Managers. The Cluster Manager also uses the Cluster Name Cache to store the availability information it receives from these probes. This information helps the Cluster Manager perform the functions listed above, such as failover and workload balancing.
To view the information in the Cluster Name Cache, type "show cluster" at the server console.
The Cluster Database Directory
A replica of the Cluster Database Directory (CLDBDIR.NSF) resides on every server in a cluster. The Cluster Database Directory contains a document about each database and replica in the cluster. This document contains such information as the database name, server name, path, replica ID, and other replication and access information. The cluster components use this information to perform their functions, such as determining failover paths, controlling access to databases, and determining which events to replicate and where to replicate them to.
The Cluster Database Directory Manager
The Cluster Database Directory Manager on each server creates the Cluster Database Directory and keeps it up-to-date with the most current database information. When you first add a server to a cluster, the Cluster Database Directory Manager creates the Cluster Database Directory on that server. When you add a database to a clustered server, the Cluster Database Directory Manager creates a document in the Cluster Database Directory that contains information about the new database. When you delete a database from a clustered server, the Cluster Database Directory Manager deletes this document from the Cluster Database Directory. The Cluster Database Directory Manager also tracks the status of each database, such as databases marked "Out of Service" or "Pending Delete."
When there is a change to the Cluster Database Directory, the Cluster Replicator immediately replicates that change to the Cluster Database Directory on each server in the cluster. This ensures that each cluster member has up-to-date information about the databases in the cluster.
The Cluster Administrator
The Cluster Administrator performs many of the housekeeping tasks associated with a cluster. For example, when you add a server to a cluster, the Cluster Administrator starts the Cluster Database Directory Manager and the Cluster Replicator. The Cluster Administrator also starts the Administration Process, if it is not already running. When you remove a server from a cluster, the Cluster Administrator stops the Cluster Database Directory Manager and the Cluster Replicator. It also deletes the Cluster Database Directory on that server and cleans up records of the server in the other servers' Cluster Database Directories.
The Cluster Replicator
The Cluster Replicator constantly synchronizes data among replicas in a cluster. Whenever a change occurs to a database in the cluster, the Cluster Replicator quickly pushes the change to the other replicas in the cluster. This ensures that each time users access a database, they see the most up-to-date version. The Cluster Replicator also replicates changes to private folders that are stored in a database. Each server in a cluster runs one Cluster Replicator by default, although you can run more Cluster Replicators if there is a lot of activity in the cluster.
The Cluster Replicator looks in the Cluster Database Directory to determine which databases have replicas on other cluster members. The Cluster Replicator stores this information in memory and uses it to replicate changes to other servers. Periodically (every 15 seconds by default), the Cluster Replicator checks for changes in the Cluster Database Directory. When the Cluster Replicator detects a change in the Cluster Database Directory -- for example, an added or deleted database or a database that now has Cluster Replication disabled -- it updates the information it has stored in memory.
The Cluster Replicator pushes changes to servers in the cluster only. The standard replicator task (REPLICA) replicates changes to and from servers outside the cluster.

Optimizing server performance (Top 10 ways to improve your server performance

By analyzing a variety of NotesBench reports, published over the last two years by NotesBench Consortium members, we came up with a list of the top 10 ways you can improve the performance of your server. The list shows you how to improve your server capacity and response time.

Make sure your server memory matches the number of users you want to support. Most NotesBench vendors use 300K-400K per active user. They also set their NSF_BUFFER_POOL_SIZE to the maximum for their memory configuration. This setting isn't necessary, because the Domino server initially obtains a quarter of available memory and grows only if necessary (depending on the load). You should use published physical memory configurations as a ceiling for memory configuration decisions.
Distribute I/O among separate devices. For example, you can put the OS kernel on one drive, the page file on another, the Domino executable on a third, and finally the Domino data files on a fourth drive. In some cases, NotesBench vendors point their log.nsf file to a location different from the default data directory (using the log= setting in the server's NOTES.INI file).
I/O subsystem improvements. For example you can:

Move from EISA-based systems (such as, controllers) to PCI-based systems
Exchange EISA/PCI boards in favor of PCI-only boards (this way, lower speed EISA devices won't decrease the I/O throughput)
Use striping to improve performance
Use multiple I/O controllers to distribute logical volumes (and use file pointers to databases across separate controllers). Make sure you have the latest BIOS for your I/O subsystem. This is an inexpensive way to remove a likely throughput bottleneck.

Use faster disk drives. You can improve disk drive speeds from 5,400 rpm to 7,200 rpm. For most Windows NT systems, NotesBench vendors use 2GB disk drives. For Solaris and IBM Netfinity systems, the drives were larger: 4GB. For AS/400, the drives were even larger: 8GB.
Increase the stripe size. NotesBench vendors use a stripe size of 8K (Digital's systems) or 16K (IBM Netfinity reports). (The IBM Netfinity report provides additional information on I/O settings such as w IOQ Depth, Outbound Posting, PCI Line Prefetch, and Address Bit Permitting.)
Use faster CPUs. NotesBench vendors have moved beyond the Pentium, Sparc, and PowerPC processors, which were in the 100-200Mhz range, to higher speed processors. However, they consistently use P6-based systems over the Pentium II systems for high-end Domino server loads. The size of your Level 2 cache should match your expected user loads and the response time you want. Vendors have moved from 256K to 512K, 1MB to 2MB Level 2 cache systems, especially on their greater than two-CPU configurations.
Improve your network. NotesBench vendors have:

Moved from 10Mbps cards and networks to 100Mbps configurations
Used multiple LAN segments (one for each partition) to isolate network traffic, at the high-end user loads

Change your network protocol to IP. Vendors were initially (two years ago) using NetBIOS and SPX internally, but have unanimously moved to IP for their performance publishing efforts.
Upgrade to a newer release of Domino. NotesBench vendors have moved from Domino Release 4.5a SMP version to Domino Release 4.52B SMP version for higher capacity results. The first Domino Release 4.6a result (AS/400) on a RAID5 configuration indicates a reliable configuration can still provide competitive response time with a properly designed I/O architecture.
Use Domino partitioned servers. NotesBench vendors have increased scaling of active user loads and leveraged their more powerful configurations (faster clock cycles, fiber-connected I/O subsystems, OS kernel to CPU binding, and multiple I/O controllers) by using partitioned servers.

How we came up with these recommendations

To understand how we came up with our top 10 list, we will take you through the performance analysis of Number 2 in the list -- to distribute I/O among separate devices. Initially, many vendors placed the kernel, page, and Domino executables on one volume and the Domino data files on another. However, both volumes were on the same controller. Lately, the NotesBench reports show improvements in performance when the volumes are separated across multiple controllers, and individual volumes are separated across disks. What this means is that we found that vendors put the OS kernel on one drive, page file on another, Domino executable on a third, and finally the Domino data files on a fourth drive. In some cases, they pointed their log.nsf file to a location different from the default data directory (using the log= setting in the server's NOTES.INI file). Vendors who distributed the I/O over several disk drives had better server performance overall, and could support a higher capacity of users. For example, in a NotesBench report published in May of 1996, Digital Equipment Corporation set up a server with the following specifications:

CPUs: four 133Mhz CPUs
Memory: 512MB
Domino: Release 4.1

They placed the operating system and the Domino executable on drive C:\, the page file on drive D:\, and the Notes\data directory on drive E:\. They could support a maximum capacity of 1,500 users with this configuration.

In a NotesBench report published in September of 1997, IBM Corporation set up a server with the following specifications:

CPUs: three 200MHz1Intel Pentium Pro processors
Memory: 1GB2
Domino: Release 4.51

They placed the operating system on drive C:\, the page file on drive C:\, the Notes\data directory on drive E:\, and the Domino executable on drive E:\. They supported a Mail-only workload of 3,500 active mail users. In a four-processor configuration, they supported a MailDB workload of 2,900 active users. These examples led us to the conclusion that distributing I/O over several disk drives had better server performance overall, and could support a higher capacity of users. We went through many other NotesBench reports to collect the data shown in our top 10 list. You can visit the NotesBench Web site yourself to view published data and test results. Visiting the site may help you to come up with other ways to improve your server's performance.

Tools for troubleshooting replication

Database access control list problems, server crashes, protocol problems, and incorrectly configured Connection documents are common causes of replication errors. Use these tools to troubleshoot replication.

Cluster replication

The log file (LOG.NSF) provides helpful information for troubleshooting replication problems within a cluster.

Log file

To access the log, from the IBM^® Lotus^® Domino^® Administrator, click the Servers - Analysis tab and select the log file for the server you want to check. Then check for replication problems in these views:

Miscellaneous events
Phone calls
Replication events

Tip You can also check replication events from the Replication tab in the Domino Administrator.

Edit the NOTES.INI file to include the Log_Replication setting, which allows you to display detailed replication information in the log.

Monitoring Configuration

The Monitoring Results database (STATREP.NSF) is a repository for pre-configured and custom statistics. It is created when you load the Collect task, if it doesn't already exist. You can set alarms for some of these statistics. For example, you might set an alarm to generate a Failure report when more than three attempted replications generate an error. You can also report statistics to any database designed for this purpose, although typically the database is the Monitoring Results database (STATREP.NSF).

Note that you can edit the NOTES.INI file to include the Repl_Error_Tolerance setting, which increases the number of identical replication errors between two databases that a server tolerates before it terminates replication. The default tolerance is 2 errors. The higher the value, the more often messages such as "Out of disk space" appear.

If you run the Event task on a server, you can set up an Event Monitor document to report replication problems. You can also create a Replication Monitor document that notifies you if a specific database fails to replicate within a certain time. To view events from the Domino Administrator, click the Server - Analysis tab, click Statistics - Events, and then view the desired report.

Replication history

The replication history for a database describes each successful replication of a database. To view the replication history of a database, select a database icon and choose File - Application - Properties (or File - Application - Replication - History).

Replication schedules

You can see a graphical representation of the replication schedules of the servers in your Domino system. To view replication schedules, from the Domino Administrator, click the Replication tab.

Replication topology maps

Create a replication topology map to display the replication topology and identify connections between servers. To view replication topology maps, from the Domino Administrator, click the Replication tab. You must load the Topology maps task before you can view a replication topology map.