Saturday, June 26, 2010

FAQ - Transactional Logging

Q: How does replication work with transaction logging enabled? Does the replicator read from the transactional log for any changes that need to be replicated from the Unified Buffer Manager (UBM) or are changes committed to the NSF first before replication occurs?

A: Replication is unaffected by Transactional Logging. Changes are available in the same manner as they are for un-logged databases. Changes are always read from the in-memory versions of databases when they are open and all cached changes are flushed when databases are closed.


Q: If a user is trying to read a note that has not been committed to the NSF with transactional logging enabled does the user session read the note from the transactional log, the UBM or is the note committed to the database first?

A: The note is read from the in-memory versions of databases when they are open and all cached changes are flushed when databases are closed.


Q: If something is done to change the DBIID of a database, are changes committed first before the DBIID gets changed?

A: Yes. Data is not lost. Things that change the DBIID, such as Fixup -J and Compact -B, change the DBIID because they are making database changes that are not logged. The order of operations is this:
(1) Flush all in-memory changes to the database
(2) Clear the DBIID and halt logging for the database
(3) Make the un-logged changes
(4) Flush all in-memory changes to the database
(5) Assign a new DBIID and start logging changes to the database again.

Note: None of these steps can be done manually; all of them are done automatically. The one manual step is to take a new backup of the database after you see the console message "Assigning New DBIID for DB xxxx."


Q: When is it beneficial to run fixup -j?

A: Hopefully never. It would be beneficial only if the database became corrupt and you did not have a backup to roll forward from.


Q: When fixup -j is run on the entire server I notice that it assigns a new DBIID to some databases but not others. Why is this?

A: Transactional logging should reassign the DBIID's for all transactionally logged databases. Fixup -j will run against unlogged as well as logged databases, so the logging status should be checked of the databases that didn't have a DBIID re-assigned to them and validate that it is not logged.


Q: If the DBIID of a database has changed for some reason, how is the database with the old DBIID and the new DBIID incorporated?

A: There is no intermixing. You must re-backup a database when the DBIID changes. Note: there is no relation to the REPLICAID or DBID.


Q: If a server panics and upon restart Domino automatically performs database recovery, will a large amount of data be lost or is the loss minimal?

A: All cached info will be re-applied and only partial API calls will be backed out. That is, if the server was in the middle of a NoteUpdate call, that would get undone. You will see EVERY NOTE you ever added (and the API call returned) with no loss of data. The Transactional Log is written directly, with no file cache (i.e. all writes are "committed" and writes are done to the Transactional Log at least after each transaction or API call).


Q: If only one database on a server becomes corrupt is it possible to restore only this one database from the transactional log or will the entire transactional log need to be replayed?

A: Transactional Logging restores only the database(s) that need to be. The whole log from the time of the backup must be replayed but only records pertaining to the databases being recovered will be processed.


Q: What is the control file used for?

A: It is used by the logger subsystem to know the state of the log files. The logging subsystem is all of the code in nsf\logger, where some of it runs in the "application" process, meaning in the process that makes logging calls (ex. the server, etc).


Q: If there are any databases or directory links on the server, will these databases be transactional logged as well?

A: Yes the actual database will be logged independent of links.


Q: If Compact changes the DBIID then what happens to a backup copy of the database? Will the log be able to recover in the event a database is replaced on-line?

A:
It assumes you did not copy back an older version of the database while the server was crashed. Never "touch" databases after a server crash until you get through a clean restart. You can start a clean restart by simply entering the command "nfixup zzzz" before the Domino server is restarted (where zzzz is a database that does not exist). The initialization of fixup will trigger the restart and recover the logged databases to a clean state. This will clear the UBMand will harden all the information to the transactionally logged databases (the actual NSF), then the server can be restarted. Consistency check will run against all non transaction logged databases.

Q: When the server is live, how is it possible to monitor the health of the transactional log?

A: Any extra "monitoring" of the log at run-time is not recommended as it will disturb your system performance. The log is not read very much while the server runs so it is true that issues with the logs would not soon be noticed; however, the log is written to constantly and all hardware would give a bad return code and generate error events if write errors occurred.


Q: The customer uses FTP to move NSF files between "live" servers. Can the Transactional Log recover from finding a new database live while the server is running?

A: IMPORTANT: Never OS copy or FTP a live logged database. The reason is that a new replica creates a replica stub on the new server, then populates the database later, preserving the state of the transaction logging. An OS copied database will be subject to a consistency check when it is placed on the new server.

It is acceptable however to pull or create a new database replica of a live logged database. A new replica creates a replica stub on the new server, then populates the database later, preserving the state of the TL. An OS copied db will be subject to consistency check when it is placed on the new server.

There are no known problems adding new databases (new databases do not have a DBIID yet) to a server "live", but you must get a "clean" version and you should make it appear "instantaneously" so it does not show up as only half there, as it would when you FTP it.

The recommendation is as follows:
  1. Use dbbackup from API Sample or a backup vendor's backup (dbbackup test tool takes a backup and "applies all pending changes" so it's a healthy backup). Make the extension of the database something non N* line .bak.
  2. FTP or copy the bak file to the other server.
  3. Rename the bak to nsf so it "appears" instantaneously.
  4. The next open will assign a new DBIID and run fixup if it was logged before, to clear the log sequence numbers in the file. It may take a while if it was a large database, but then it will behave normally.

An existing database that is not open that needs to be moved from one server to another should be done using Notes\New Copy or New Replica. Prior to the move, the extension should be changed to something other than .nsf. The database can then be moved via new copy or new replica. When the move is complete, the extension should be changed back to .nsf. This method ensures that transactional logging begins logging the database only when it is complete and consistent. It is important to note that this will change the DBIID of the database so a backup must then be taken.


Q: A database on a Domino server is continuously becoming corrupted and is then being marked as read only. If transactional logging is enabled on the server will it mark the database as read only?

A: The only thing that transactional logging will do is mark the database corrupt. It will never mark the database as read only.


Q: How does transactional logging operate with SCOS (shared mail)?

A: The capability to have transactional logging for SCOS in R5.x does not exist. This capability is present in Notes/Domino 6.
Product documentation

Abstract
Domino supports transaction logging and recovery. With this feature enabled, the system captures database changes and writes them to the transaction log. Then if a system or media failure occurs, you can use the transaction log and a third-party backup utility to recover your databases



Content
Lotus Domino supports transaction logging and recovery. With this feature enabled, the system captures database changes and writes them to the transaction log. Then if a system or media failure occurs, you can use the transaction log and a third-party backup utility to recover your databases
IMPORTANT: Enabling transaction logging can improve server performance in most cases. Transaction logging saves processing time because it allows Domino to defer database updates to disk during periods of high server activity. Transactions are recorded sequentially in the log files, which is much quicker than database updates to random, nonsequential parts of a disk. Because the transactions are already recorded, Domino can safely defer database updates until a period of low server activity.


What is transaction logging?

Transaction logging keeps a sequential record of every operation that occurs to data. If a database becomes corrupted, you can "roll back" the database to a point before it was corrupted and replay the changes from the transaction log.

A single transaction is a series of changes made to a database on a server -- for example, a transaction might include opening a new document, adding text, and saving the document.

Transaction logging provides three main benefits:

- In most situations, you no longer need to run the Fixup task to recover databases following a system failure. Excluding Fixup results in quicker server restarts, since Fixup must check every document in each database, while transaction log recovery applies or undoes only those transactions not written to disk at the time of the system failure.

- Transaction logging saves processing time because it allows Domino to defer database updates to disk during periods of high server activity. Transactions are recorded sequentially in the log files, which is much quicker than database updates to random, nonsequential parts of a disk. Because the transactions are already recorded, Domino can safely defer database updates until a period of low server activity.

- Using transaction logging simplifies your daily backup procedure. You can use a third-party backup utility to perform daily incremental backups of the transaction logs, rather than perform full database backups.

IMPORTANT: Transaction logging works with databases in format ODS 41 or higher but not with databases that use formats from earlier releases (ODS 20 will not work). After you enable transaction logging, all databases are automatically logged. To check database formats, use the Files tab in Domino Administrator.

NOTE: To use all of the features of transaction logging and recovery, you need a third-party backup utility that supports Domino transaction logging.



What is considered a transaction?
A transaction is a single API call. It includes creating, modifying, reading (unread marks change) or deleting documents. A transaction is considered COMPLETE when the change has been saved to disk by the user. For example, if a user makes a change to the database, and does not save that change before the server crashes, that transaction is not considered COMPLETE. The transaction would have been COMPLETE only if the user had saved the change before the server had crashed. COMPLETE transactions are "commited" to the transactional log.


What is a Transaction log?
A transactional log is a binary file where transactions are written. The transactional log has a .txn file extension. These .txn files should never be deleted. The maximum size of each log extent (.txn file) is 64 MB. You can have several .txt logs based on the size specified in the Server document. The maximum total of .txn files is 4 GB.



What is the Database Instance ID (DBIID)
When you enable transaction logging, Domino assigns a Database Instance Identifier (DBIID) to each Domino database. When Domino records a transaction in the log, it includes the DBIID. During recovery, Domino uses the DBIID to match transactions to databases (it identifies which database the changes should be applied to). The DBIID is stored in the file header, along with the database ID and the Replica ID. Note: There is no relation to the Replica ID or the DBID.

Some database maintenance activities, such as compaction with options, cause Domino to assign a new DBIID to a database. From that point forward, all new transactions recorded in the log use the new DBIID; however, any old transactions still have the old DBIID and no longer match the database's new DBIID. As a result, Domino cannot restore these old transactions to the database.

To avoid losing data, you should immediately perform a full database backup whenever a database receives a new DBIID. When you perform this backup, you capture all the database transactions up until that point and ensure that Domino needs only the new transactions (with the new DBIID) to restore the database. If the DBIID changes and a backup is not taken after the fact, the database cannot be successfully restored (backup will have the old DBIID and the transactional log will not "know" the old DBIID.

NOTE: The DBIID has no relation to the REPLICAID or DBID.

Domino assigns a new DBIID to Domino databases when:

You enable transaction logging for the first time.
- System logging is disabled then re-enabled.
- The database is compacted using copy-style compaction.
- The database has had Fixup -J applied to it.

IMPORTANT NOTES:
  • If a database is logged, the default for Compact with no switches is -b (lowercase)
  • If a database is un-logged, the default for Compact with no switches is -B (uppercase).
  • Compact with no switches and Compact -b (lowercase b) are the only times Compact does not change the DBIID.
  • The DBIID changes when a database is copy-style compacted because a copy-style essentially creates an entire new NSF with a new structure, which basically does not match the structure in the logs for the "old" NSF anymore. Note: -L, -c, and -i are switches that enable copy style compaction. -B at times uses copy style compaction.
  • Compact -B may change the DBIID. This option uses in-place compaction unless there is a pending structural change in which case copy-style compacting occurs. So when using this option and transaction logging, do full database backups after compacting completes.
  • Fixup is forced on the database (fixup -j)
  • You move a Notes database from one logged server to another logged server or from an unlogged server to a logged server.

NOTE: Changing the log path or maximum log size (after initial set up and use) does not trigger a DBIID change.



How to set up Transaction logging

  1. Ensure that all databases to be logged reside in the Domino data directory, either at the root or in subdirectories.

  2. From the Domino Administrator, click the Configuration tab.

  3. In the "Use Directory on" field, choose the server's Domino Directory.

  4. Click Server Configuration, and then click Current Server Document.

  5. Click the Transactional Logging tab.

  6. Complete these fields, and then save the document.
FieldEnter
Transactional LoggingChoose Enabled. The default is Disabled.
Log pathPath name location of the transaction log.
The default path name is \LOGDIR in the Domino data directory, although it is strongly recommended to store the log on a separate, mirrored device, such as a RAID (Redundant Array of Independent Disks) level 0 or 1 device with a dedicated controller.
The separate device should have at least 1GB of disk space for the transaction log. If you are using the device solely for storing the transaction log, set the "Use all available space on log device" field to Yes.
Maximum log spaceThe maximum size, in MB, for the transaction log. Default is 192MB. Maximum is 4096MB (4GB).
Domino formats at least 3 and up to 64 log files, depending on the maximum log space you allocate.
Use all available space on log deviceChoose one:
  • Yes to use all available space on the device for the transaction log. This is recommended if you use a separate device dedicated to storing the log. If you choose Yes, you don't need to enter a value in the "Maximum log space" field.
  • No to use the default or specified value in the "Maximum log space" field.
Automatic fixup of corrupt databasesChoose one:
  • Enabled (default). If a database is corrupt and Domino cannot use the transaction log to recover it, Domino runs the Fixup task, assigns a new DBIID, and notifies the administrator that a new database backup is required.
  • Disabled. Domino does not run the Fixup task automatically and notifies the administrator to run the Fixup task with the -J parameter on corrupt logged databases.
Runtime/Restart performanceThis field controls how often Domino records a recovery checkpoint in the transaction log, which affects server performance.
To record a recovery checkpoint, Domino evaluates each active logged database to determine how many transactions would be necessary to recover each database after a system failure. When Domino completes this evaluation, it:
  • Creates a recovery checkpoint record in the transaction log, listing each open database and the starting point transaction needed for recovery.
  • Forces database changes to be saved to disk if they have not been saved already.
Choose one:
  • Standard (default and recommended). Checkpoints occur regularly.
  • Favor runtime. Domino records fewer checkpoints, which requires fewer system resources and improves server run time performance.
  • Favor restart recovery time. Domino records more checkpoints, which improves restart recovery time because fewer transactions are required for recovery.
Logging styleChoose one:
  • Circular (default) to continuously re-use the log files and overwrite old transactions. You are limited to restoring only the transactions stored in the transaction log.
  • Archive (recommended) to not re-use the log files until they are archived. A log file can be archived when it is inactive, which means that it does not contain any transactions necessary for a restart recovery. Use a third-party backup utility to copy and archive the existing log. When Domino starts using the existing file again, it increments the log file name. If all the log files become inactive and are not archived, Domino creates additional log files.


How to disable Transaction Logging for a specific database
In most cases, disabling Transaction Logging (on a server or database level) is not recommended because you lose all of the benefits of transaction logging (there are no ill side effects of disabling, you simply lose the benefits). One of the benefits of transaction logging is fast server restart. Disabling transaction logging will cause Fixup to run on the database (or all databases on the server), creating the potential for slow restart.

After you set up transaction logging, all databases that are in Domino Release 5 or higher format are logged. You can disable transaction logging of specific databases.

Attachments are transactionally logged; however, attachments are logged redo only. Therefore, if the database is recovered using media recovery you will get back the last copy of the attachment (once they are done they stay done). If, however, the server crashes with uncommited attachment updates, they will not be undone since an undo record is never created for them .

Views are not logged, so after media recovery, you will need to rebuild views.

First, perform any of the following:
  • When creating a new database, choose "Disable transaction logging" on the Advanced Databases Options dialog.
  • For an existing database, choose "Disable transaction logging" on the Database Properties box, Beanies tab.
  • In Domino Administrator, select a database on the Files tab, choose Tools - Database - Advanced Properties, then choose "Disable transaction logging"
  • Use the Compact task with the -t parameter.

Second, ensure that all users have closed the database. Next, use the DBCACHE command with the "flush" parameter to close the database in the database cache. Finally, open the database.



How to schedule backups or Transaction logs and logged databases
Backups are essential for recovering from a media failure, which is a failure of the server's disk or disks. If you have a third-party backup utility, you should:

- Schedule daily incremental backups of the transaction log. Use the backup utility daily to back up the transaction log.
- Schedule archiving of transaction log files. If you use the archive logging style, use a third-party backup utility to schedule archiving of log files.
- Schedule weekly full database backups. Each week, it is recommended to run the Compact task with the option to reduce file size. Because this compaction style changes each database's DBIID, you should schedule compaction with a full database backup.



How to fix corrupted databases
Corrupted databases don't occur frequently when you use Release 5 or higher databases and transaction logging. When you use transaction logging to log changes to Release 5 or higher databases, a server automatically uses the transaction log to restore and recover databases after a system failure, for example after server failures or power failures. If a disk failure occurs, you use the transaction log along with a certified backup utility to restore and recover the databases.



Using Transaction logging for recovery
Transaction logging is an integral part of recovering from system and media failures. A system failure causes the server to stop and requires you to restart the server. During restart, Domino automatically performs database recovery. The system uses the transaction logs to apply or undo database transactions not flushed to disk for databases that were open during the system failure.

Domino also runs the Fixup task on databases that use formats from earlier releases, databases that are in Release 5 or higher format but have transaction logging disabled, and on corrupt databases if you have the "Auto fixup of corrupt databases" field in the Server document set to Yes.



Fixup -J
Causes Fixup to run on databases that are enabled for transaction logging. Fixup -j should only be run if a database is corrupt and you have no backup of the database to roll forward from.

Without this -j option, Fixup generally doesn't run on logged databases. The Fixup task interferes with the way transaction logging keeps track of databases. If you are using a backup utility certified for Domino, it's important that you schedule a full back up of the database as soon after Fixup finishes as possible.



Notes.ini parameter: Translog_Status
The TRANSLOG_Status NOTES.INI parameter is used to enable transaction logging for all databases on the server. "0" is disabled, "1" is enabled.

Thursday, June 24, 2010

How failover and Load balancing works in Domino Cluster

How failover works
A cluster's ability to redirect requests from one server to another is called failover. When a user tries to access a database on a server that is unavailable or in heavy use, Domino directs the user to a replica of the database on another server in the cluster.

The Cluster Manager on each cluster server sends out probes to each of the other cluster servers to determine the availability of each server. The Cluster Manager also checks continually to see which replicas are available on each server. When a user tries to access a database that is not available, the user request is redirected to a replica of the database on a different server in the cluster. Although the user connects to a replica on a different server, failover is essentially transparent to the user.

Example
This example describes the process that Domino uses when it fails over. This cluster contains three servers. Server 1 is currently unavailable. The Cluster Managers on Server 2 and Server 3 are aware that Server 1 is unavailable.

Failover in a cluster

1. A Notes user attempts to open a database on Server 1.

2. Notes realizes that Server 1 is not responding.

3. Instead of displaying a message that says the server is not responding, Notes looks in its cluster cache to see if this server is a member of a cluster and to find the names of the other servers in the cluster. (When a Notes client first accesses a server in a cluster, the names of all the servers in the cluster are added to the cluster cache on the client. This cache is updated every 15 minutes.)

4. Notes accesses the Cluster Manager on the next server listed in the cluster cache.

5. The Cluster Manager looks in the Cluster Database Directory to find which servers in the cluster contain a replica of the desired database.

6. The Cluster Manager looks in its server cluster cache to find the availability of each server that contains a replica. (The server cluster cache contains information about all the servers in the cluster. Cluster servers obtain this information when they send probes to the other cluster servers.)

7. The Cluster Manager creates a list of the servers in the cluster that contain a replica of the database, sorts the list in order of availability, and sends the list to Notes.

8. Notes opens the replica on the first server in the list (the most available server). If that server is no longer available, Notes opens the replica on the next server in the list. In this example, Server 2 was the most available server.

When the Notes client shuts down, it stores the contents of the cluster cache in the file CLUSTER.NCF. Each time the client starts, it populates the cluster cache from the information in CLUSTER.NCF.


How workload balancing works
By distributing databases throughout the cluster, you balance the workload in the cluster so that no server is overloaded. In addition, there are several NOTES.INI variables you can set to help balance the workload. For example, you can specify a limit on how busy a server can get by specifying an availability threshold. When the server reaches the availability threshold, the Cluster Manager marks the server BUSY. When a server is BUSY, requests to open databases are sent to other servers that contain replicas of the requested databases. You can also specify the maximum number of users you want to access a server. When the server reaches this limit, users are redirected to another server. This keeps the workload balanced and keeps the server working at optimum performance.

When a user tries to open a database on a BUSY server, the Cluster Manager looks in the Cluster Database Directory for a replica of that database. It then checks the availability of the servers that contain a replica and redirects the user to the most available server. If no other cluster server contains a replica or if all cluster servers are BUSY, the original database opens, even though the server is BUSY.

Example
This example describes how Domino performs workload balancing. This cluster contains three servers. Server 2 is currently BUSY because the workload has reached the availability threshold that the administrator set for this server. The Cluster Managers on Server 1 and Server 3 are aware that Server 2 is BUSY.

Workload balancing in a cluster

1. A Notes user attempts to open a database on Server 2.

2. Domino sends Notes a message that the server is BUSY.

3. Notes looks in its cluster cache to find the names of the other servers in the cluster.

4. Notes accesses the Cluster Manager on the next server listed in the cluster cache.

5. The Cluster Manager looks in the Cluster Database Directory to find which servers in the cluster contain a replica of the desired database.

6. The Cluster Manager looks in its server cluster cache to find the availability of each server that contains a replica.

7. The Cluster Manager creates a list of the servers in the cluster that contain a replica of the database, sorts the list in order of availability, and sends the list to Notes.

8. Notes opens the replica on the first server in the list (the most available server). If that server is no longer available, Notes opens the replica on the next server in the list.

The cluster components

There are several components that work together to make clustering function correctly. These include:
• The Cluster Manager
• The Cluster Database Directory
• The Cluster Database Directory Manager
• The Cluster Administrator
• The Cluster Replicator
• The Internet Cluster Manager

These components are described in the following sections, except the Internet Cluster Manager, which is described in the section "Clustering Domino Servers that Run Internet Protocols."
The Cluster Manager
A Cluster Manager runs on each server in a cluster and tracks the state of all the other servers in the cluster. It keeps a list of which servers in the cluster are currently available and maintains information about the workload on each server.
When you add a server to a cluster, Domino automatically starts the Cluster Manager on that server. As long as the server is part of a cluster, the Cluster Manager starts each time you start the server.
Each Cluster Manager monitors the cluster by exchanging messages, called probes, with the other servers in the cluster. Through these probes, the Cluster Manager determines the workload and availability of the other cluster servers. When it is necessary to redirect a user request to a different replica, the Cluster Manager looks in the Cluster Database Directory to determine which cluster servers contain a replica of the requested database. The Cluster Manager then informs the client which servers contain a replica and the availability of those servers. This lets the client redirect the request to the most available server that contains a replica.
The tasks of the Cluster Manager include:
• Determining which servers belong to the cluster. It does this by periodically monitoring the Domino Directory for changes to the ClusterName field in the Server document and the cluster membership list.
• Monitoring server availability and workload in the cluster.
• Informing other Cluster Managers of changes in server availability.
• Informing clients about available replicas and availability of cluster servers so the clients can redirect database requests based on the availability of cluster servers (failover).
• Balancing server workloads in the cluster based on the availability of cluster servers.
• Logging failover and workload balance events in the server log file.

When it starts, the Cluster Manager checks the Domino Directory to determine which servers belong to the cluster. It maintains this information in memory in the server's Cluster Name Cache. The Cluster Manager uses this information to exchange probes with other Cluster Managers. The Cluster Manager also uses the Cluster Name Cache to store the availability information it receives from these probes. This information helps the Cluster Manager perform the functions listed above, such as failover and workload balancing.
To view the information in the Cluster Name Cache, type "show cluster" at the server console.
The Cluster Database Directory
A replica of the Cluster Database Directory (CLDBDIR.NSF) resides on every server in a cluster. The Cluster Database Directory contains a document about each database and replica in the cluster. This document contains such information as the database name, server name, path, replica ID, and other replication and access information. The cluster components use this information to perform their functions, such as determining failover paths, controlling access to databases, and determining which events to replicate and where to replicate them to.
The Cluster Database Directory Manager
The Cluster Database Directory Manager on each server creates the Cluster Database Directory and keeps it up-to-date with the most current database information. When you first add a server to a cluster, the Cluster Database Directory Manager creates the Cluster Database Directory on that server. When you add a database to a clustered server, the Cluster Database Directory Manager creates a document in the Cluster Database Directory that contains information about the new database. When you delete a database from a clustered server, the Cluster Database Directory Manager deletes this document from the Cluster Database Directory. The Cluster Database Directory Manager also tracks the status of each database, such as databases marked "Out of Service" or "Pending Delete."
When there is a change to the Cluster Database Directory, the Cluster Replicator immediately replicates that change to the Cluster Database Directory on each server in the cluster. This ensures that each cluster member has up-to-date information about the databases in the cluster.
The Cluster Administrator
The Cluster Administrator performs many of the housekeeping tasks associated with a cluster. For example, when you add a server to a cluster, the Cluster Administrator starts the Cluster Database Directory Manager and the Cluster Replicator. The Cluster Administrator also starts the Administration Process, if it is not already running. When you remove a server from a cluster, the Cluster Administrator stops the Cluster Database Directory Manager and the Cluster Replicator. It also deletes the Cluster Database Directory on that server and cleans up records of the server in the other servers' Cluster Database Directories.
The Cluster Replicator
The Cluster Replicator constantly synchronizes data among replicas in a cluster. Whenever a change occurs to a database in the cluster, the Cluster Replicator quickly pushes the change to the other replicas in the cluster. This ensures that each time users access a database, they see the most up-to-date version. The Cluster Replicator also replicates changes to private folders that are stored in a database. Each server in a cluster runs one Cluster Replicator by default, although you can run more Cluster Replicators if there is a lot of activity in the cluster.
The Cluster Replicator looks in the Cluster Database Directory to determine which databases have replicas on other cluster members. The Cluster Replicator stores this information in memory and uses it to replicate changes to other servers. Periodically (every 15 seconds by default), the Cluster Replicator checks for changes in the Cluster Database Directory. When the Cluster Replicator detects a change in the Cluster Database Directory -- for example, an added or deleted database or a database that now has Cluster Replication disabled -- it updates the information it has stored in memory.
The Cluster Replicator pushes changes to servers in the cluster only. The standard replicator task (REPLICA) replicates changes to and from servers outside the cluster.

Optimizing server performance (Top 10 ways to improve your server performance

By analyzing a variety of NotesBench reports, published over the last two years by NotesBench Consortium members, we came up with a list of the top 10 ways you can improve the performance of your server. The list shows you how to improve your server capacity and response time.

  1. Make sure your server memory matches the number of users you want to support. Most NotesBench vendors use 300K-400K per active user. They also set their NSF_BUFFER_POOL_SIZE to the maximum for their memory configuration. This setting isn't necessary, because the Domino server initially obtains a quarter of available memory and grows only if necessary (depending on the load). You should use published physical memory configurations as a ceiling for memory configuration decisions.
  2. Distribute I/O among separate devices. For example, you can put the OS kernel on one drive, the page file on another, the Domino executable on a third, and finally the Domino data files on a fourth drive. In some cases, NotesBench vendors point their log.nsf file to a location different from the default data directory (using the log= setting in the server's NOTES.INI file).
  3. I/O subsystem improvements. For example you can:
    • Move from EISA-based systems (such as, controllers) to PCI-based systems
    • Exchange EISA/PCI boards in favor of PCI-only boards (this way, lower speed EISA devices won't decrease the I/O throughput)
    • Use striping to improve performance
    • Use multiple I/O controllers to distribute logical volumes (and use file pointers to databases across separate controllers). Make sure you have the latest BIOS for your I/O subsystem. This is an inexpensive way to remove a likely throughput bottleneck.
  4. Use faster disk drives. You can improve disk drive speeds from 5,400 rpm to 7,200 rpm. For most Windows NT systems, NotesBench vendors use 2GB disk drives. For Solaris and IBM Netfinity systems, the drives were larger: 4GB. For AS/400, the drives were even larger: 8GB.
  5. Increase the stripe size. NotesBench vendors use a stripe size of 8K (Digital's systems) or 16K (IBM Netfinity reports). (The IBM Netfinity report provides additional information on I/O settings such as w IOQ Depth, Outbound Posting, PCI Line Prefetch, and Address Bit Permitting.)
  6. Use faster CPUs. NotesBench vendors have moved beyond the Pentium, Sparc, and PowerPC processors, which were in the 100-200Mhz range, to higher speed processors. However, they consistently use P6-based systems over the Pentium II systems for high-end Domino server loads. The size of your Level 2 cache should match your expected user loads and the response time you want. Vendors have moved from 256K to 512K, 1MB to 2MB Level 2 cache systems, especially on their greater than two-CPU configurations.
  7. Improve your network. NotesBench vendors have:
    • Moved from 10Mbps cards and networks to 100Mbps configurations
    • Used multiple LAN segments (one for each partition) to isolate network traffic, at the high-end user loads
  8. Change your network protocol to IP. Vendors were initially (two years ago) using NetBIOS and SPX internally, but have unanimously moved to IP for their performance publishing efforts.
  9. Upgrade to a newer release of Domino. NotesBench vendors have moved from Domino Release 4.5a SMP version to Domino Release 4.52B SMP version for higher capacity results. The first Domino Release 4.6a result (AS/400) on a RAID5 configuration indicates a reliable configuration can still provide competitive response time with a properly designed I/O architecture.
  10. Use Domino partitioned servers. NotesBench vendors have increased scaling of active user loads and leveraged their more powerful configurations (faster clock cycles, fiber-connected I/O subsystems, OS kernel to CPU binding, and multiple I/O controllers) by using partitioned servers.

How we came up with these recommendations

To understand how we came up with our top 10 list, we will take you through the performance analysis of Number 2 in the list -- to distribute I/O among separate devices. Initially, many vendors placed the kernel, page, and Domino executables on one volume and the Domino data files on another. However, both volumes were on the same controller. Lately, the NotesBench reports show improvements in performance when the volumes are separated across multiple controllers, and individual volumes are separated across disks. What this means is that we found that vendors put the OS kernel on one drive, page file on another, Domino executable on a third, and finally the Domino data files on a fourth drive. In some cases, they pointed their log.nsf file to a location different from the default data directory (using the log= setting in the server's NOTES.INI file). Vendors who distributed the I/O over several disk drives had better server performance overall, and could support a higher capacity of users. For example, in a NotesBench report published in May of 1996, Digital Equipment Corporation set up a server with the following specifications:

  • CPUs: four 133Mhz CPUs
  • Memory: 512MB
  • Domino: Release 4.1

They placed the operating system and the Domino executable on drive C:\, the page file on drive D:\, and the Notes\data directory on drive E:\. They could support a maximum capacity of 1,500 users with this configuration.

In a NotesBench report published in September of 1997, IBM Corporation set up a server with the following specifications:

  • CPUs: three 200MHz1Intel Pentium Pro processors
  • Memory: 1GB2
  • Domino: Release 4.51

They placed the operating system on drive C:\, the page file on drive C:\, the Notes\data directory on drive E:\, and the Domino executable on drive E:\. They supported a Mail-only workload of 3,500 active mail users. In a four-processor configuration, they supported a MailDB workload of 2,900 active users. These examples led us to the conclusion that distributing I/O over several disk drives had better server performance overall, and could support a higher capacity of users. We went through many other NotesBench reports to collect the data shown in our top 10 list. You can visit the NotesBench Web site yourself to view published data and test results. Visiting the site may help you to come up with other ways to improve your server's performance.

Tools for troubleshooting replication

Database access control list problems, server crashes, protocol problems, and incorrectly configured Connection documents are common causes of replication errors. Use these tools to troubleshoot replication.

Cluster replication

The log file (LOG.NSF) provides helpful information for troubleshooting replication problems within a cluster.

Log file

To access the log, from the IBM® Lotus® Domino® Administrator, click the Servers - Analysis tab and select the log file for the server you want to check. Then check for replication problems in these views:

  • Miscellaneous events
  • Phone calls
  • Replication events

Tip You can also check replication events from the Replication tab in the Domino Administrator.

Edit the NOTES.INI file to include the Log_Replication setting, which allows you to display detailed replication information in the log.

Monitoring Configuration

The Monitoring Results database (STATREP.NSF) is a repository for pre-configured and custom statistics. It is created when you load the Collect task, if it doesn't already exist. You can set alarms for some of these statistics. For example, you might set an alarm to generate a Failure report when more than three attempted replications generate an error. You can also report statistics to any database designed for this purpose, although typically the database is the Monitoring Results database (STATREP.NSF).

Note that you can edit the NOTES.INI file to include the Repl_Error_Tolerance setting, which increases the number of identical replication errors between two databases that a server tolerates before it terminates replication. The default tolerance is 2 errors. The higher the value, the more often messages such as "Out of disk space" appear.

If you run the Event task on a server, you can set up an Event Monitor document to report replication problems. You can also create a Replication Monitor document that notifies you if a specific database fails to replicate within a certain time. To view events from the Domino Administrator, click the Server - Analysis tab, click Statistics - Events, and then view the desired report.

Replication history

The replication history for a database describes each successful replication of a database. To view the replication history of a database, select a database icon and choose File - Application - Properties (or File - Application - Replication - History).

Replication schedules

You can see a graphical representation of the replication schedules of the servers in your Domino system. To view replication schedules, from the Domino Administrator, click the Replication tab.

Replication topology maps

Create a replication topology map to display the replication topology and identify connections between servers. To view replication topology maps, from the Domino Administrator, click the Replication tab. You must load the Topology maps task before you can view a replication topology map.

Wednesday, June 23, 2010

Indexer tasks: Update and Updall

The Update and Updall tasks keep view indexes and full-text indexes up-to-date.

Update

Update is loaded at server startup by default and runs continually, checking its work queue for views and folders that require updating. The indexer uses modest system resources by waiting five seconds between each database update operation that is performs.

The Update task performs three different updating tasks:

  • Updates views in the Domino Directory.
  • Updates views in all other databases. When a request is made to update a view, the view is only updated if there are at least 20 note changes since the last update and if the view has been accessed in the last 7 days. The view update service increases the speed of the view access time when a view is opened in the Notes client. If views are not updated often, the only effect on users or applications is a slow view open time because views are automatically updated when opened.
  • Updates full-text indexes. Full-text indexing provides the ability to search for notes that have been recently added. If a note is added after the most recent full-text indexing, that note will not be found by a full text search.


The Update task uses a separate thread for full-text indexing which makes view updates more timely than in releases prior to Domino 7.

Update maintains two queues of work -- an immediate queue and a deferrred queue. Other server components, such as the router and replicator, post requests to the updater when changes are made to databases. Some requests are posted as deferred and some as immediate.

This table lists how full-text index updates are performed according the update frequency:

Update frequency

Description

Daily

Performed by the nightly Updall task. If this nightly task is not run, the daily updating is not performed.

Scheduled

Performed by a Program document which runs Updall. You need to set the frequency to Scheduled and create the proper Program document. You can also use this method to update different databases at different times.

Hourly

Triggered by the chronos task and performed by the update task if the update task is running. If the update task is not running, chronos performs the update. If the chronos task is not running, the update is not performed.

Immediate

Performed by the Update task. If Update is not running, the update is not performed. All immediate requests are processed as they are received.

Deferred

Deferred requests are held for 15 minutes before they are processed. Requests to update the same database that occur in that time are ignored as duplicate requests.

When a view or folder change is recorded in the queue, Update waits approximately 15 minutes before updating all view indexes in the database so that the update can include any other database changes made during the 15-minute period. After updating view indexes in a database, it then updates all databases that have full-text search indexes set for immediate or hourly updates.

When Update encounters a corrupted view index or full-text index, it rebuilds the view index or full-text index in an attempt to correct the problem. Update deletes the view index or full-text index and rebuilds it.

Note The Update task spawns a directory indexer thread. The directory indexer runs at one-minute intervals and is dedicated to keeping Domino Directory view indexes up-to-date so that any changes to the directory are usable as soon as possible. The directory indexer runs against any local or remote Domino Directory or Extended Directory Catalog that a server uses for directory services. The task of updating the Domino Directory view indexes does not lock the views, and you should be able to create new server sessions while this task is running.

To improve view-indexing performance, you can run multiple Update tasks if your server has adequate CPU power.

Managing the update task and its use of system resources

The indexer is able to keep up with the update rate in the server's default configuration if the server has a low update rate, that is, if few changes are made to databases on the server. If a server has a high update rate due to heavy application database use, a large number of mail users, or a large volume of mail, the default resource usage configuration can cause the updater queues to become large. To determine whether the updater queues are large, examine the queue length statistics that are available in Lotus Domino 7. If you determine that the update queues are too large, determine a methodology for performing updates on that server. Long queues typically indicate that views and full-text indexes are not up-to-date.

Here are some sample scenarios and practices that you may want to use as well as the steps to implement them.

  • Scenario one -- The queues are usually short, unless a full-text index starts for a large update volume database. When this occurs, the view updating requests wait for the full-text index. This causes the queues to increase until the full-text indexing is complete. To use slightly more system resources to keep the queues short, perform view updates and full-text index updates in separate threads. To do so, enter this variable, UPDATE_FULLTEXT_THREAD=1, in your server's NOTES.INI file.
  • Scenario two -- The queues grow slowly over time and become too long because the Updater task is not getting sufficient system resources to keep the queues short. To use additional resources to keep the queues short, set a delay between each Update operation. To set the delay, enter these variables, UPDATE_IDLE_TIME (and FTUPDATE_IDLE_TIME if two threads are used) in the server's NOTES.INI file. By default, the delay is 5 seconds. To allow the Update task to use additional system resources, set the delay to less than 5 seconds. Finer precision may be required on a large server. In that case, you can set the delay in milliseconds (Domino 7 only) by adding these variables, UPDATE_IDLE_TIME_MS (and FTUPDATE_IDLE_TIME_MS if two threads are used), to the server's NOTES.INI file.
  • Scenario -- Servers that have high update rates often require too many system resources to keep the queues small. In this case, you can decide not to perform view updates at all, and just allow view opens to perform the updates automatically. Disable the view updates by adding this variable, UPDATE_DISABLE_VIEWS=1, to the server's NOTES.INI file. Another option is to limit the number of immediate updates for full-text databases. You change the update frequency for databases to hourly, daily, or to a specific schedule. You can also delete extraneous full-text indexes.

To allow frequent full-text indexing on only a small number of databases, and to prevent other databases from being full-text indexed, disable full -text indexing in the Updater and then add Program documents to schedule Updall to run, for example, every half hour (30 minutes). To disable full-text indexing in the Updater, enter this variable, UPDATE_DISABLE_FULLTEXT=1, in the server's NOTES.INI file.

You can prevent performing any updates at all, and just allow view opens to perform the view updates automatically. To prevent updates, edit the NOTES.INI variable by remoinge the update string.

If a system has adequate system resources to perform updates, you can run multiple Update tasks. To do so, edit the variable, ServerTasks, in the NOTES.INI file and add a second Update task.

You can adjust the controls that determine whether a modified view is actually updated or not. The database and view must still be opened, but if these thresholds are not reached, the view is not updated.


For more information, see UPDATE_ACCESS_FREQUENCY and UPDATE_NOTE_MINIMUM as well as other NOTES.INI settings.

Updall

Updall is similar to Update, but it doesn't run continually or work from a queue; instead you run Updall as needed. You can specify options when you run Updall, but without them Updall updates any view indexes or full-text search indexes on the server that need updating. To save disk space, Updall also purges deletion stubs from databases and discards view indexes for views that have been unused for 45 days, unless the database designer has specified different criteria for discarding view indexes. Use the NOTES.INI setting Default_Index_Lifetime_Days to change when Updall discards unused view indexes.

Like Update, Updall rebuilds all corrupted view indexes and full-text search indexes that it encounters.

By default Updall is included in the NOTES.INI setting ServerTasksAt2, so it runs daily at 2 AM. Running Updall daily helps save disk space by purging deletion stubs and discarding unused view indexes. It also ensures that all full-text search indexes that are set for daily updates are updated.

Note When views are being rebuilt - either through the Designer or Updall tasks - all new server sessions that are attempted once the rebuild process has started are locked out. Therefore, it is recommended that changes to master templates, as well as complete view rebuilds, be scheduled for late at night, when users are far less likely to require access to the server.

The following table compares the characteristics of Update and Updall. For Updall, the table describes default characteristics. For information on options you can use to modify some of these characteristics, see the topic Updall options.

Characteristic

Update

Updall

When it runs

Continually after server startup

2 AM and when you run it

Runs on all databases?

No. Runs only on databases that have changed.

Yes

Refreshes views indexes?

Yes

Yes

Updates full-text indexes?

Yes. Updates full-text indexes set for immediate and hourly updates.

Yes. Updates all full-text indexes.

Detects and attempts to rebuild corrupted view indexes?

Yes

Yes

Detects and attempts to rebuild corrupted full-text indexes?

Yes

Yes

Purges deletion stubs?

No

Yes

Discards unused view indexes?

No

Yes (after a view is unused for 45 days or according to a view discard option specified by a designer)

Ignores "Refresh index" view property?

Yes

Yes

Can customize with options?

No

Yes