Sunday, September 26, 2010

Optimizing server performance: Semaphores (Part 2)

A semaphore is to your system what a traffic signal is to an intersection: a mechanism that keeps things running smoothly. Specifically, semaphores ensure that a server completes certain tasks before it begins other tasks.

This article is part two in a two-part series on recognizing, troubleshooting, and preventing semaphore issues. Part one of this series discusses why semaphore timeouts occur and how to troubleshoot the reasons behind them. This article builds on the advice from the first article and presents additional troubleshooting techniques, this time based on real-life experiences. Considerations to keep in mind while designing applications and using LotusScript, as well as R5-specific troubleshooting techniques also appear in this article.

Troubleshooting semaphores: Advice and real experiences

After reading the output from the Sem.Timeout command (see the Correlating a semaphore number with a specific process sidebar from this article and the Analysis of a Sem.Timeouts line sidebar from the first article) or after reading Domino server console messages and parsing out the information displayed, you understand areas of the system where the semaphore timeout occurred. Based on that information, you can try some of the following techniques to continue narrowing the focus area around the semaphore timeout.

Unless otherwise noted, these techniques evolved from R4 experiences. Use them in conjunction with the troubleshooting advice provided in part one of this article. For easy reference, we’ve organized the techniques according to the area of the system where the semaphore timeouts occur (Database, Server, Indexer). You may recognize some of the techniques as they are applicable not only to semaphore timeout issues, but also to general Domino server administration and troubleshooting.

Check disk configuration and database placement (Database)

Check your disk I/O response rate and configuration to improve end-user response time. Use the platform tools to discern the disk activity rate. Disk configuration analysis means looking at the physical disk allocation, logical disk allocation, relationship to disk controllers, and so on. Once you understand your disk configuration, analyze the relationship of the various Domino components, the location of Domino executables, the Domino data files, the paging file, and the transaction log. For more information, see the Iris Today article, "Optimizing server performance: I/O subsystems" and Lotusphere99 presentation ID603, Maximizing DB performance, reliability, availability & scalability.

To reduce database contention independent of making disk configuration changes, try approaches that may reduce database access contention or improve file access time. Some ways to accomplish such improvements include moving the database to a different disk, a different controller, or a less busy disk. For more information on pinpointing specific database issues, see the section on the Show DBS command later in this article.

Try clustering (Database)

You may want to pursue a clustering strategy if the database semaphores 0244 and 0245 appear and you notice the following:

  • Many active users on the system (from review of Server.Users value -- use the Show Stat Server command to display this value).
  • Considerable activity opening databases (from review of the Show DBS output -- see the section below for more detail).
  • Considerable activity opening views (from review of LOG_UPDATE).

By clustering servers, you can distribute some user workload. Please refer to the Iris Today articles, "Optimizing server performance: Domino clusters Part 1" and Part 2 for solution ideas.

Redistribute user population (Database)

Another tactic to use when you notice a high count of active users on the system is to move users off the system. This tactic applies to application and mail servers only. For reference, see the posted benchmark reports on the NotesBench Web site. These reports provide information on the maximum number of users for a given system configuration executing a specified workload.

Once you move the users to other Domino systems, analyze whether you need to move their associated databases. Also, after you move users off the system, review the change in system performance with respect to databases accessed and the activity rate for the different Domino server tasks (internally and externally developed).

Review the new server console command Show DBS -- R5 only (Database)

The Show DBS command, new in R5, displays useful information about the databases currently in use, such as the number of times a database has been opened, whether the database has been modified, and the number of times a user has had to wait for a lock on the database. See the How we used output from Show DBS sidebar to review output from this command and learn how we used this output to help us detect bottleneck situations. Also see the topic "Improving database and Domino Directory performance" in Domino 5 Administration Help for additional information on this command.

Review output from Show Database command (Database)

Use the output from Show Database when analyzing a specific database. The output from this command reports a variety of information, including the sizes of the different views in the database (in byte count) and what different objects can be found in the database (for example, documents, forms, and views). The following is an excerpt from the output from the Show Database command:

> Show Database demo.nsf




Sample Database




Document TypeLiveDeleted
Documents234966
Info00
Form870
View800



View sizesBytes
People231,208
Server\Connections131,880
($ServerAccess)1,531,440
($Users)1,721,480



Marks view0

For example, we learned from our R4 experiences that the $Users view is large and often rebuilt. From such observations, we employed coding changes to improve upon the lower-level Indexing structures, more typically referred to as the B-Trees. Specifically, we changed the B-Trees storage mechanism so that the B-Trees would be rebuilt in the update areas, not necessarily rebuilding everything all of the time. Consequently, we expect the view rebuild issues to appear less frequently. We do understand that as Domino R5 scales more, users will scale more with us, and the new upper boundaries reached will challenge boundary limits.

Review output from Show Directory console command (Database)

The output from the Show Directory command represents the entries currently found in the Domino database cache. The entries include a database’s full path, name, version number, modified time, and state of transaction logging. We provide sample output from the Show Directory command below (the Modified Time column has been omitted in this example):

>Show Directory


[017E:0003-01CF] DbNameVersionLogged
[017E:0003-01CF] f:\notefile\schema50.nsfV5:41Yes
[017E:0003-01CF]f:\notefile\Stats.boxV5:27No
[017E:0003-01CF]f:\notefile\mail.boxV5:40No
[017E:0003-01CF]f:\notefile\mail2.boxV5:41Yes
[017E:0003-01CF]f:\notefile\mail1.boxV5:41Yes
[017E:0003-01CF]f:\notefile\wsj.nsfV5:41Yes
[017E:0003-01CF]f:\notefile\wrkinst.nsfV3:17No
[017E:0003-01CF]f:\notefile\wpissues.nsfV5:41Yes
[017E:0003-01CF]f:\notefile\webuser.nsfV3:17No
[017E:0003-01CF]f:\notefile\webadmin.ntfV4:20No
[017E:0003-01CF]f:\notefile\webadmin.nsfV5:41Yes
[017E:0003-01CF]f:\notefile\web.nsfV5:41Yes
[017E:0003-01CF]f:\notefile\VinodTes.nsfV5:41Yes
[017E:0003-01CF]f:\notefile\userreg.ntfV4:20No
[017E:0003-01CF]f:\notefile\usenet.nsfV5:41Yes
[017E:0003-01CF]f:\notefile\usegate.nsfV5:41Yes
[017E:0003-01CF]f:\notefile\unixdisc.nsfV5:41Yes
[017E:0003-01CF]f:\notefile\unames4.ntfV4:20No
[017E:0003-01CF]f:\notefile\uiv5info.nsfV5:41Yes
[017E:0003-01CF]f:\notefile\uiteam.nsfV5:41Yes

If you observe a good amount of contention on a given database (reviewing information from the Show DBS command), and see that the database version is from an earlier Domino release, it is time to update the revision of the database to see if the problem goes away.

Check replication strategy (Database)

You should review the replication strategy specified for your topology, as the Replicator task can impose a load on the Domino server. If you start seeing the database’s view collection semaphore (030B) and database’s view collection queue semaphore (0309), it may be time to locate the database and view in question. You can get this information (the database and view name) from the LOG_UPDATE and LOG_VIEW_EVENTS output. Activity on a given view can result from user activity or server task requests -- in particular, Domino Directory lookups slow down the replication activity because view rebuild or replication logging takes too long. If you want to defer the rebuild on the Domino Directory, set the NOTES.INI setting SERVER_NAME_LOOKUP_NOUPDATE=1. This variable allows access to name lookup views in the Domino Directory during the view index process. It also allows access to otherwise locked views during the view process for authentication and mail routing purposes. See Lotus Customer Support Technote #150232, What Does the SERVER_NAME_LOOKUP_NOUPDATE=1 Server NOTES.INI Parameter Do? for more information on the use and benefit of this variable.

You can also review the Domino statistics for replica entries (via the Show Stat Replica command) and see the rate of documents being added, deleted or updated. Reviewing this information in synch with the Show DBS command may paint a larger picture. At this point, you may notice the number of waiters growing. There is a relationship here, as the database semaphore is needed to successfully execute the add, delete, and update requests from the Replicator. This means in order to perform the replication successfully, the target database needs to be accessed. If this database has a lot of user/server tasks that want access to it, the replication process cannot complete.

Create multiple Mail.Box files (Database)

If you notice the Domino statistic Mail.Waiting growing or at a high number and the database semaphore messages appearing (0244 or 0245), use Show Stat Mail to review the Mail Router statistics. For R5, make sure you’re taking advantage of the feature to create multiple mail boxes. Refer to Lotusphere99 presentation ID603, Maximizing DB performance, reliability, availability & scalability, or Domino 5 Administration Help for information on configuring multiple mail boxes.

Using more than one mail box reduces the time wasted waiting for an update to occur, and interfacing with more than one mail box enables the Router to operate more efficiently. There is a trade-off point though, where supporting multiple mail boxes might not bring additional response time benefits or more efficient use of system resources.

You can begin the move to multiple mail boxes by moving from one mail box to two. Then you can increment to three or four mail boxes on a single Domino server if necessary. For information about mail delivery threads, see the Tell Router Show command (described below) and review the Lotusphere99 presentation ID601, Deploying Domino R5 for performance and scalability.

Adjust NSF_BUFFER_POOL_SIZE (Database)

Once you adjust the NOTES.INI setting NSF_BUFFER_POOL_SIZE within the valid range, the current set of users and applications might experience a performance improvement. Keep in mind that adjusting this value is often a short term solution, particularly if you request additional buffer pool space, which requires more overall memory. Thus, less memory is available for additional and future user or application growth. Also, adjusting this value limits the amount of memory available to the operating system’s caching requirements.

In Domino R5, we changed the NSF Buffer Pool allocation algorithm to use what the Domino server needed, up to the defaulted or specified amount. The R4 strategy would pre-allocate and reserve the default or specified amount. For example, in R5, if you specify a value of 10,000 for the buffer pool size and only 5,000 is needed, only 5,000 is allocated. In Domino R4, if you specify a value of 10,000 for the buffer pool size and only 5,000 is needed, 10,000 is allocated.

Turn on transaction logging -- R5 only (Database)

New to R5, transaction logging enables you to recover your databases should a system failure occur. Generally, we have observed performance improvements with the recommended use of transaction logging. Why? If the system experiences semaphore contention issues, the extra performance throughput that transaction logging provides may eliminate or minimize the contention situations. In summary, the sequential nature that transaction logging uses to write out the data translates to a much more efficient method for saving and updating the data that Domino stores. For more information, see the Iris Today article, "Optimizing server performance: Transaction logging" or Domino 5 Administration Help.

Analyze server transactions (Server)

Output from the Show Trans command gives you a table of information, including the transaction function types the server executes, the execution frequency (count), and the associated execution times (minimum, maximum, total, average). This information gathers continuously, so if the transaction information hasn’t been cleared (via a Show Trans Reset command), it should represent a summarized history of transactions the Domino server executed since it started (or since last reset). Refer to Domino 5 Administration Help for more information on this command.

From Show Trans output, you can discern a certain system execution profile for your Domino server. You can identify a pattern of transaction types, how often they occur, and which ones take longer to execute. Each transaction has its own unique requirements for CPU, memory, disk utilization and other Domino resources. For example, if the server does a lot of mail processing, more transactions appear in that feature area and less in other areas.

Note that each server executes different types of transactions so you cannot always compare the transaction profiles across different servers, particularly "specialized" servers such as those focused on mail routing or database applications. Some transactions are supposed to take longer to execute, so each transaction needs to be reviewed and analyzed separately.

If you become familiar with Show Trans output, you can determine if excessive time is spent in a general feature area or if a specific transaction is taking a longer than average time to execute. Note the numbers, as there may be a wide range when viewing the "Min" column as compared to the "Max" column, and then factor in the average value. Capturing this data more frequently and noting the widespread data ranges (that is, when the spread occurred) gives you another valuable nugget of information when trying to understand what your Domino server is doing at a given point in time.

Here is sample output from a Show Trans command:

FunctionCountMinMaxTotalAverage
OPEN_DB2610250221085
OPEN_NOTE6206026043
UPDATE_NOTE2610640160061
DB_INFO_GET10000
SEARCH3307701020340
DB_REPLINFO_GET23010703
REMOTE_CONSOLE7040608
CLOSE_DB2601402409
CLOSE_COLLECTION5010102
OPEN_COLLECTION53013503100620
READ_ENTRIES5031035070
NIF_OPEN_NOTE120202020
SET_COLLATION2010105
READ REPLICATION HISTORY13024095073
WRITE REPLICATION HISTORY5010204
GET_MULT_NOTE_INFO_BY_UNI270190260130

In the sample output above, look at the values for the various columns associated with a given server transaction. From these values, you can draw different conclusions. For example, when the maximum value is close to the average value, this implies a fairly constant response time. However, a large range between maximum and average value implies variability in the response time or some internal blocking. It is also interesting to note if the minimum range is close or far from the maximum or average values, which would imply a constant response rate (meaning the system is probably not overutilized). Conversely, a wider response range would imply varying loads on the server, which potentially overutilizes the server.

Note that Web-based interactions do not include entries in the Server Transaction table (this includes HTTP, IMAP, LDAP, POP3, and so on).

Review specific server transactions (Server)

For a quick look at some of the server transactions (from Show Trans output), concentrate on the OPEN_COLLECTION, OPEN_DB and OPEN_NOTE transactions. They have associated "CLOSE_" type transactions (CLOSE_COLLECTION, CLOSE_DB and CLOSE_NOTE). These transactions give you insight into the view, database, and individual note level of activity. The START_SERVER transaction includes the authentication process. So if you want to determine if there is a server authentication issue or if you notice that users attaching to the server are taking a longer time to connect to the server, the START_SERVER transaction is the one to monitor.

The transaction information can be cleared via the Show Trans Reset command. We recommend you clear the command when trying to isolate a problem situation, assuming it is a reoccurring problem situation. The command clears the whole transaction table list. As transactions are executed after this point, they are added to the list. The numbers associated with the transactions will then more clearly reflect the activities and execution time for a shorter time range.

Check server tasks (Server)

If contention exists on a database on which you do not normally see contention (that is, not a "well known" database), then dive a little deeper and figure out what impact other Domino server tasks have on the database. Many of the tasks have associated console commands, which can help provide additional insight. The following list provides some console commands you can use for further information about the server tasks and their activity.

  • Agent Manager -- Use Tell Amgr Status to identify the total number and elapsed time of the Agent Manager runs. You can also check out the Agent Manager statistics, under Show Stat Amgr for more details.
  • Router -- Use Tell Router Show for information on how the transfer threads and delivery threads are configured and executing. The output from the command includes the maximum number of threads and the total number of used and inactive threads. For the transfer thread configuration, you also see the maximum number configured to execute concurrently. You receive the same output from the Tell Router Show command as the Tell Router Status command.
  • Mail -- Use Show Stat Mail to identify the current mail status. The output from this command gives you a variety of datapoints relating to the mail item delivery rate and the data throughput for the mail items.

You might think viewing output from these commands is equivalent to looking for a needle in a haystack; however, once you have the general feel for how your system behaves, an uncharacteristic response from one or more of the commands can help guide you to the next course of action, which may reduce the semaphore contention issues or internal bottlenecks.

Adjust SERVER_MAX_CONCURRENT_TRANS value (Server)

The default value for the NOTES.INI setting SERVER_MAX_CONCURRENT_TRANS is 20. The recommendation for changes applies to a single partition and to Domino Release 4.x only where the threadpool architecture enhancement is not available. Before beginning any adjustments, become familiar with the System Counter Context Switches/sec (found in NT Perfmon). This value is a good one to watch as changes are made to the transaction rate value. As a general rule, you can increase the SERVER_MAX_CONCURRENT_TRANS value in increments of 20. If the Context Switching number starts to go higher, then you’ve reached or exceeded the limit of where things should be. We recommend that after every adjustment, you wait a week to make sure the Domino server remains in a stable state.

From an end-user perspective, the value is set too high when the Notes clients are timing out and not successfully connecting to the Domino server. From the Domino server console perspective, the value is too high when after issuing the Show Users command, you notice that names are not always listed next to the session. The names are not listed because the authentication process does not successfully complete to provide more background information. Another variation of the same issue occurs when as part of the mail delivery process, the $Users view (needed for successful mail delivery) is accessed for granting permission onto the Domino server. Once successfully accessed, then a name is assigned to the session. The unauthenticated session problem just described is an example of an outcome that can result if a Domino server is forced to go beyond reasonable expectations.

We found another example of users unsuccessfully trying to authenticate in R4 production environments in the rebuilding of the $ServerAccess view (needed for authentication). In a large Domino Directory, it takes longer than average to rebuild the view. When there are 20 users concurrently active and the rest of the user population waits on an event, such as requesting an authentication through the server, this situation causes a continuous loop where Notes clients time out. The Notes clients that have timed out try and reattach to the server, which means they need to authenticate. However, the authentication process cannot complete until the $ServerAccess view is rebuilt. And processes other than the Indexer process cannot gain access to the view, so the Notes clients time out again. Thus, the thread count continues to increase, but no additional users gain access to the server.

Check the Domino Directory and associated view rebuild time (Indexer)

If the Domino server takes a long time to update a view, a few different semaphore issues might start to appear. Typically we have observed that the database (0244), view collection (030B) and/or session table semaphores (0A0B) start appearing. Another symptom, which we describe in greater detail in the first part of this article, is seeing unauthenticated users appearing in the list output from the Show Users console command (the command output displays session numbers instead of user names). Again, this type of behavior occurs when the updates take a long time and the users cannot successfully authenticate to establish a session with the Domino server.

The Domino Directory typically has many changes in the form of updates, additions, and deletions. The number of changes also has an impact upon the server’s replication activity. The Domino Directory typically keeps information recent and up-to-date, because authentication process uses this information and many user and server initiated tasks access this central file. So you need to decide the delicate balance that works for your system -- find a place between the Domino Directory updating almost to the minute for view rebuilds and document content and the Domino Directory batching up some of the changes and thus not necessarily updating to the minute.

If you want to defer the rebuild on the Domino Directory, set the NOTES.INI setting SERVER_NAME_LOOKUP_NOUPDATE=1. This variable allows access to name lookup views in the Domino Directory during the view index process. It also allows access to otherwise locked views during the view process for authentication and mail routing purposes. See Lotus Customer Support Technote #150232, What Does the SERVER_NAME_LOOKUP_NOUPDATE=1 Server NOTES.INI Parameter Do? for more information on the use and benefit of this variable.

Analyze long view updates (Indexer)

We frequently spend time analyzing the view rebuild time. You can also be more proactive on this path. To perform an initial level of troubleshooting, think about the following related points:

  • Is it a large view?
  • Are there lots of views?
  • Are there a lot of updates taking place? A good way to tell if there are "a lot" of updates taking place is to start noticing if a view is constantly being updated or if users complain that they have to wait for a view to be available to them.

For example, if the updates are coming from the Replicator and the Replicator logging options are enabled (output is directed to the Domino Server console and log, see Domino 5 Administration Help for more information on enabling), then the Replicator displays the number of notes added/deleted/updated in each database. This information is helpful in terms of reflecting a level of activity, but it doesn't reveal what would trigger a view update. That is, it doesn’t reveal which notes are associated with a given view.

Internally, there is a semaphore protecting each view called the Collection semaphore. If a view is large, with mostly read-type activities performed on the notes (such as minimal new notes being added or existing notes being modified) and unread marks are disabled (which would cause updates to the view behind the scenes) the view update and display of a large view should perform well. Otherwise, you may want to investigate if splitting up the view into two or more parts is a viable option for your application.

In general, these are the important points to consider when addressing a view update concern on your Domino server:

  • Are you properly set up to take advantage of the R5 enhancements for Optimized Rebuild? For information, see Lotusphere99 presentation ID603, Maximizing DB performance, reliability, availability & scalability
  • Is your Buffer Pool specification initialized to a value, and is it too small?
  • Are you taking full advantage of understanding the output from the specifications LOG_VIEW_EVENTS and LOG_UPDATE?

Analyze whether the timeout is a real problem

Finally, as we mentioned in the first article, not all semaphore timeout messages indicate a serious problem. Using information from this and the first article on semaphores, you have techniques to help you learn what issues exist behind the timeout and can, thus, discern if the timeout is a temporary situation or a real problem.

This is an example of a timeout caused by a temporary situation: As part of the evaluation effort to put together this article, we artificially created a semaphore timeout situation by executing the Show Stat command frequently on a busy server. As a result, we incurred the semaphore timeout of 0116. The long Show Stat output (especially using the debug version of the code) may block another process as it executes. This type of situation is not serious and should only occur once for each Show Stat console command issued. At the beginning of this article we recommended becoming familiar with the output from the command, but note that we supplied specific parameters to view certain specific subsets of the Show Stat output. Alternatively, if your platform supports it, you can capture the same Domino server statistics as the results from the Show Stat command by looking at Stats & Events database. This database captures the same data automatically on a scheduled interval (performed by the Collector task) and is helpful when performing historical trend analysis.

Optimizing server performance: Semaphores Part1

What is a semaphore?

A semaphore is a software flag that ensures a server completes certain tasks before it can begin another. In systems that multitask, semaphores are a key ingredient in making those systems run smoothly and reliably.

Notes/Domino locks in a semaphore during one task and unlocks the semaphore when the task completes. For example, when the Indexer rebuilds an index, it locks a semaphore to prevent other tasks from using the view until it finishes rebuilding. If another task then tries to open that view, it has to wait for the Indexer to complete rebuilding the view index and unlock the semaphore.

What if a task waits and waits for a semaphore to unlock? Just as a driver waiting at an intersection becomes curious if the red light is slow to turn green, a user task – through Domino server internals – records how long it waits and triggers a semaphore timeout notification (when the NOTES.INI setting DEBUG_SHOW_TIMEOUT is set) if the wait exceeds 30 seconds. The semaphore handling logic then logs a message to the server console. The user task continues to wait for the semaphore to unlock, timing out every 30 seconds, until the "semaphore holder" task ends. In complicated processes, the task may have to wait through several timeouts. While the semaphore timeout message displays, end users may notice sluggish system performance or an unresponsive server, but end users do not see semaphore messages.


Why do semaphore timeouts occur?

Semaphore timeouts can occur in a number of situations. For example, a task locks a semaphore and then does not release it (see possible reasons listed below), leaving tasks in the queue waiting for that semaphore to unlock. Not all semaphore timeout messages signal a serious problem, though they likely indicate an active, fully loaded or possibly overloaded system. Knowing these scenarios can help you differentiate between a real problem and a temporary situation.

Reasons for semaphore timeouts include the following:

  • A heavy load on the server, causing processes to delay releasing semaphores.
  • A semaphore deadlock – where two tasks are waiting on the resources the other uses and neither can break the loop. For example, Task A locks Semaphore 1 and then tries to lock Semaphore 2. In the meantime, Task B has already locked Semaphore 2 and is now trying to lock Semaphore 1. Task A is stuck waiting for Semaphore 2 and Task B is waiting for Semaphore 1.
  • A process that fails to release a semaphore during execution, which then blocks another process awaiting the semaphore.
  • A situation where poorly written scripts or programs are in contention for the same database at the same time. For example, in an end user environment, this type of situation can occur if a program is verified to work in a low-scale test configuration but encounters problems when rolled out to a much larger scale production environment.

You can check to see if the system has experienced a semaphore timeout in one of two ways.

  • Type Show Stat Sem.Timeouts at the Domino server console. If a timeout has occurred, you receive a display similar to the following:

    Sem.Timeouts= 430D:58 0A12:42 030B:28 0116:26 0A12:21

  • Check the Domino server console for a message that reads:

    Session semaphore held for [n] seconds

    Make sure to enable the DEBUG_SHOW_TIMEOUT setting in NOTES.INI for these messages to display. These messages/errors do not print to the server log file.

Reading a semaphore timeout message

Domino/Notes records the type and count of semaphore timeouts in output from the Show Stat Sem.Timeouts console command. You can view the actual semaphore timeout message on the console as well or in the SEMDEBUG.TXT file. See the "Troubleshooting – Collecting information" section below for details on enabling the appropriate NOTES.INI parameters for message display.

Learning how to decipher a semaphore timeout message typically narrows down the focus area of concern. We provide a sample semaphore timeout message below, followed by detail on how to analyze it. This message is an example and the text may differ from message to message.

THREAD [009F:00BE -0012 ] WAITING FOR RWSEM 0x4245 open database queue semaphore
(@01070246) R=0,W=1,WRITER=009F:00AF-0025,1STREADER=009F:00AF-0025)FOR 5000 ms

The bold characters in the above message reflect new information added from Domino R4 to R5.

Note: The terminology within this article supports the platforms that have the process and thread architecture. These platforms include Windows NT, UNIX, and OS/2. The process ID corresponds to the Domino Add-in task or Domino (Main) Server task. As many of the tasks are multi-threaded, one or more threads may have an association with a given process. For platforms that do not support threads (but do support processes), such as Netware and Windows 95 and 98, the handling is slightly different.

In the example, 0x4245 is the semaphore in question. The text following 0x4245 tersely describes what type of semaphore is being requested (in this case, open database queue semaphore). Part two of this article includes information on the activity with which specific semaphores are associated.

The table below analyzes specific identifiers in the sample semaphore message. See the "Troubleshooting – Reading information" section below for help corresponding thread IDs with specific tasks.

IdentifiersMeaning
[ 009F :00BE-0012] 009F is the process that requested the semaphore. More specifically, extract 009F and record that as the process ID. This ID represents the process that wants to lock the semaphore even though another process may own the semaphore.
[009F: 00BE -0012] 00BE is the virtual thread ID. More specifically, extract 00BE and record that as the virtual thread ID. This information is the value used internally within Domino to identify the executing thread.
[009F:00BE- 0012 ] 0012 is the thread ID that the platform operating system knows. This information is important to record for future debugging efforts and should be passed to Lotus Customer Support to help in troubleshooting. The physical thread information reporting is a result of the Domino R5 thread pool architecture enhancement.
009F :00AF 009F is also the process last locked the semaphore; AF is its virtual thread ID.

In the above example, the same process (009F) is trying to access the semaphore, but different threads are executing at different points in the logic. One thread has the semaphore and is blocking the other thread. Not all semaphore timeouts involve issues this complicated, but this example provides you with detail on one of the more challenging situations.

For more information on reading semaphore timeout messages, see the Sem.Timeouts Analysis sidebar and Lotus Customer Support Technote #112710, Semaphores and Semaphore Timeouts.

Troubleshooting Collecting information

To analyze why semaphore timeout messages occur and determine their level of severity, you need to collect information. In general, you should have an understanding of how your system works at different levels of activity. This section offers suggestions on methods of gathering and analyzing relevant information.

Part two of this article supplies additional troubleshooting advice, based on real-life semaphore timeout scenarios derived from customer observations.

Semaphore messages are not written to the server log file, so taking steps to collect information is even more crucial to troubleshooting. The following strategies collect information from your system that will help you troubleshoot semaphore timeouts.

Know your system behavior at a steady state and active state time

Steady state is the time that users are not stressing the system (such as the start of the day, when users start up at different times). Active state is the time when users are logging on simultaneously and are most active (such as mid-morning and afternoon).

Once you know the system profile at steady state, you can move on to learning how the Domino server behaves at a more active state. Thus, when changes occur gradually or as a spike, you can discern how the impact upon steady or active state performance has changed. Know the differences, the valid ranges, and the potential for change (based on projected future database usage, new applications, and so on) for your Domino server.

In particular, gather such information from the Domino server console using the following commands and their output:

  • Show Stat Sem.Timeouts – Displays the total count and type of Sem.Timeouts messages that occurred since the start of the Domino server.
  • Show Stat Server.Users – Displays the count of active users at the current time.
  • Show Stat Database.DbCache.AverageDbOpenTime – Displays the average time in seconds it takes to open Domino databases when servicing internal and user requests.
  • Show Tasks – Displays the status and activity of active tasks and their server.
  • Show Perf – Displays the date, time, number of transactions executed, and number of currently active users on a continuous minute-by-minute basis.
  • Show Dbs – Displays performance information about a database, such as the number of times a database has been opened, whether the database has been modified, and the number of times a user has had to wait for a lock on the database. This command is new for R5.

Make use of various tools

The following tools can give you information on active processes and their activity level and physical thread ID values. You can use the process information from the semaphore message described in the previous section as an index to locate the process of interest from the tool output.

  • Perfmon – On the Windows NT platform, use Perfmon to help augment the information Domino Administrator collects for performance of specific system and Domino components. In particular, you can view the activity level at which Add-in tasks operated via the Process stats.
  • Windows NT Task Manager – This tool helps you identify the process ID for each task. To view IDs, click the Processes tab and match the process in the Image Name column to the process ID in the PID column. When you view the process ID in a semaphore timeout message, you can then associate it with a task. For more information, see Lotus Customer Support Technote #175384, How To Utilize SEMDEBUG.TXT and Task Manager.
  • Qslice – On the Windows NT platform, this tool can show the active tasks at any point in time. It exposes physical thread ID values under the associated process ID.
  • NSD – Developed by Domino engineers, this tool is currently available for the UNIX platform as part of the Domino server package. It shows the stack for the thread in question. For more information, see the Troubleshooting recommendations for UNIX sidebar.

For more information on other platform tools and on how to improve server performance, download the Lotusphere presentation "How to Deploy Domino R5 for Performance and Scalability" from the Lotus IT Central Performance Zone Technical Library.

Turn on NOTES.INI settings

Various NOTES.INI settings, when enabled, may provide additional insight into a problem situation. In particular, turn on the following:

NOTES.INI SettingDescription
DEBUG_SHOW_
TIMEOUT=1
Displays timeout messages on the server console.
DEBUG_CAPTURE_
TIMEOUT=1
Writes timeout messages to a SEMDEBUG.TXT file. These messages do not write to the server log file.
DEBUG_SEM_
TIMEOUT= n
Rate at which the Domino server console reports semaphore timeout messages in milliseconds. Default is 30000 milliseconds or 30 seconds. Do not change the default setting without contacting Lotus Customer Support.
DEBUG_THREADID=1Use to correlate messages from the DEBUG_OUTFILE with specific process and thread IDs.
LOG_UPDATE =2 LOG_VIEW_EVENTS=1Use to delve deeper into database and Indexer related issues.
NSF_BUFFER_POOL_
SIZE= value
Use only if your system is in a partition configuration. If not, accept the default setting (the underlying code makes a calculation based on a fixed overhead and total memory available). For information on calculating a setting for your partition system, see "Optimizing server performance: Port encryption and Buffer Pool settings" or the Lotusphere presentation Deploying Domino R5 for performance and scalability.
SERVER_MAX_
CONCURRENT_ TRANS= n
Use to achieve greater yield from Domino and to guarantee a certain level of activity when many users access the server simultaneously. Default is 20. This can also increase the semaphores available and enables the server to be more efficient when dealing with large bursts of activity. This applies only to Domino configurations that do not support the latest thread pool architecture enhancements (which are found in R5).
SERVER_SHOW_
PERFORMANCE=1
Enables additional information on the server console output, including the number of transactions executed in the past minute and the current number of server users.
DEBUG_OUTFILE= path file name Creates a file that captures server console output, which you can use to correlate messages with specific process and thread IDs. Use this file in conjunction with DEBUG_THREADID. This file does take up disk space and CPU to execute, so you should disable after use. For more information, see Lotus Customer Support Technote #162400, How Should the DEBUG_OUTFILE Parameter Be Implemented?

For more information on Domino server commands and NOTES.INI settings, see Domino 5 Administration Help.

Troubleshooting – Reading information

In many cases, you may have to deduce problems by eliminating certain causes. The following strategies offer advice on reading information to troubleshoot the cause of a semaphore timeout.

Read the server log file

A good place to start analyzing semaphore behavior is in the server log file, where output from console commands appears. In particular, the output from the commands Show Stat Server.Users and Show Stat Sem.Timeouts provide insight into activity that has changed over a given time period.

To determine trend patterns, analyze the server log file on a daily and weekly basis. When analyzing a situation that needs closer monitoring, you may want to capture the Domino server console information within a one to four hour range. This information makes you better prepared to correlate output from Server.Users and Sem.Timeouts console commands to output from the log file. Also, remain aware of how the values change as usage on the Domino server increases. Note that even though semaphore timeout messages do not appear in the server log file, the total count of Sem.Timeouts increases with each semaphore message displayed on the console.

View output from LOG_UPDATE and LOG_VIEW_EVENTS

In the server log file, view the output from the LOG_UPDATE and LOG_VIEW_EVENTS commands when you believe the issue is database or Indexer related.

The LOG_UPDATE command tells you about indexing activity that the Domino Update task performs, providing the name of the database, the name of the view being updated, and status information. If the LOG_UPDATE command is set, and the messages regarding the updating views display quickly on the Domino server console, the views are current (that is, the messages display as each view is checked for recent activity).

The output from the LOG_VIEW_EVENTS command tells you the reason for activity, such as rebuilding a view, which the Update (Indexer) task also performs. For example, you can see when and which views are being rebuilt in the more popular databases such as the Domino Directory or Mail.Box. Also, you can start to put some of these clues together. For example, if you notice that a given view is frequently listed as being rebuilt, you might want to check into whether available disk space is a limiting issue. If there is not enough disk space, the view rebuild process never completes successfully, and the Domino Administrator notices that the system keeps trying to rebuild the views.

View output from DEBUG_THREADID

In the server log file, view the output from DEBUG_THREADID to correlate the DEBUG_OUTFILE file messages with a process ID or a thread ID. You need to cross-reference DEBUG_THREADID output with your OS listing of processes and their associated IDs. See the "Troubleshooting – Collecting information" section above for more information on process IDs. We recommend that you contact Lotus Customer Support when using DEBUG_THREADID as they can guide you through the analysis.

Read Database.DbCache.AverageDbOpenTime information

This statistic keeps track of the average time, in seconds, it takes to open Domino databases when servicing internal and end user requests. You should review this value on a continuous basis, noting its trends over time (if your system typically assumes a consistent behavior over time). This value can serve as an indication of system problems, including system overload. Always keep in mind that one statistic does not provide you with an overall picture of the health of the server. You should review other statistics as well for a more complete picture.

View activity from Domino Add-in tasks

When analyzing the semaphore timeout changes in synchronization with the server log output, you can also see activity the other Domino Add-in tasks performed. These tasks, on their own or in conjunction with another Domino component, may also have a role in the semaphore timeout.

Reviewing the comment field displayed next to the various Add-in tasks (viewed through output from Show Tasks command) gives you additional information about what the task is doing. You can also monitor if a task is stuck, which is the case when a status message doesn't change over a reasonable period of time. Some exceptions, such as when a very large view is being rebuilt, may occur here. Again, your familiarity with the system and its behavior can help you spot a true problem situation.

View Domino server user count

When reviewing messages from the Domino server console or from the server log, view the Domino server user count along with the other reported transactions. To turn on display of the user count, first make sure the NOTES.INI setting SERVER_SHOW_PERFORMANCE is set, or issue the Domino server console command Show Perf. From then on, the Domino server user count automatically appears on the server console and becomes part of the log. This information display occurs on a continuous minute-by-minute basis. It contains the date, time, number of transactions executed over the past minute, and number of currently active users.

Analyze Sem.Timeouts information

The Sem.Timeouts output provides a report of the type and the count of semaphore timeouts. The output line can list more than one semaphore timeout type. A large jump in a timeout count over the monitoring period provides good reason to investigate further. As a gauge, an increase in a given semaphore count by approximately 5 to 10 in an hour is an indication that an area needs further investigation.

With the enhanced output in SEMDEBUG.TXT, STATREP.NSF (where statistics are captured), and the server log file, you can isolate the approximate time when there was a higher rate of timeouts for a particular semaphore. Also, try to determine in what area of the product the semaphore timeout occurred (such as, a task within the Domino server attempts to execute down a given path and is blocked). See the Sem.Timeouts Analysis sidebar and Lotus Customer Support Technote #112710, Semaphores and Semaphore Timeouts, to understand Sem.Timeouts information, particularly how it keeps track of which timeouts occurred and their frequency.

Note that you cannot re-initialize Sem.Timeouts stats, so any change or analysis occurs in terms of the first value captured.

Analyze semaphore timeout messages

The actual semaphore timeout messages are written to the SEMDEBUG.TXT file when you have enabled the NOTES.INI setting DEBUG_CAPTURE_TIMEOUT. Using one of the platform tools (for example, the Windows NT Task Manager) to help you translate the information reported in these messages, you can learn more about the semaphore under contention, including the tasks involved. See the "Reading a semaphore timeout message" section earlier in this article for help in deciphering a message.

Eliminate extraneous Add-in tasks

If possible, eliminate any extraneous Domino server Add-in tasks, either internal or external (such as an end-user-developed application or program) to Domino. This action helps to minimize the server load and to reduce opportunities for contention, especially in cases where the semaphore messages act as a warning for an overloaded system.

You may want to treat the removal of externally developed Add-in tasks differently than those developed internally. Sometimes externally developed tasks interfere or conflict with the overall processing of Domino Add-in tasks. You can remove tasks temporarily to see if the system experiences any changes. If the semaphore messages disappear, appear at a different rate, or if different messages appear, you now have additional insight as to the cause and effect.

Review NSF_BUFFER_POOL_SIZE

We recommend accepting the default value of the NOTES.INI setting NSF_BUFFER_POOL_SIZE. If you do specify a setting, you should review this setting after upgrading the system to a new Domino release or when system configuration changes. This guideline of accepting the default specification applies to single partition configurations.

Check SERVER_MAX_CONCURRENT_TRANS

Look to see if this NOTES.INI setting is too high. The default for this setting is 20, and we recommend increasing it in increments of 20. If Notes clients are timing out, then this value is set too high. Part two of this article contains more information on this setting.

Consider changing the DEBUG_SEM_TIMEOUT default

Before you change this NOTES.INI setting, contact Lotus Support for assistance. DEBUG_SEM_TIMEOUT controls the rate in milliseconds at which the semaphore blockages are reported and enables an administrator to be told, at a given time threshold, how long a semaphore request is blocked. For your environment, there might be reason to change from the default from 30000 (30 seconds).

We recommend the next reasonable value as 60000 (60 seconds), but you should base final adjustment on your understanding of the server's behavior and what you want to accomplish. For example, you may want to adjust the default value for DEBUG_SEM_TIMEOUT to something else when running an overused server, because an overused server is more prone to displaying semaphore timeout messages. And you presumably already have the expectation to see semaphore timeout messages (as the messages act as warning about system overuse) based on the workload, so increasing the time interval minimizes the acknowledged situation.

Conclusion

Again, it is worth restating that the presence of semaphore timeouts are not always a bad thing for the system. In fact, a healthy system is likely to encounter semaphore timeouts too. Learning about semaphore notifications and how they occur can help you maintain a reliable and efficient system.

In this article, you have learned what semaphores are, why they produce timeouts, and how you can check if your system has encountered any. You also know how to read timeout messages and how to troubleshoot the cause of the timeout messages. Part two of this article provides additional troubleshooting advice, this time based on real-life scenarios. The second part of this article also gives tips on minimizing semaphore issues in your system.

Optimizing server performance: Semaphores

What is a semaphore?

A semaphore is a software flag that ensures a server completes certain tasks before it can begin another. In systems that multitask, semaphores are a key ingredient in making those systems run smoothly and reliably.

Notes/Domino locks in a semaphore during one task and unlocks the semaphore when the task completes. For example, when the Indexer rebuilds an index, it locks a semaphore to prevent other tasks from using the view until it finishes rebuilding. If another task then tries to open that view, it has to wait for the Indexer to complete rebuilding the view index and unlock the semaphore.

What if a task waits and waits for a semaphore to unlock? Just as a driver waiting at an intersection becomes curious if the red light is slow to turn green, a user task – through Domino server internals – records how long it waits and triggers a semaphore timeout notification (when the NOTES.INI setting DEBUG_SHOW_TIMEOUT is set) if the wait exceeds 30 seconds. The semaphore handling logic then logs a message to the server console. The user task continues to wait for the semaphore to unlock, timing out every 30 seconds, until the "semaphore holder" task ends. In complicated processes, the task may have to wait through several timeouts. While the semaphore timeout message displays, end users may notice sluggish system performance or an unresponsive server, but end users do not see semaphore messages.

Why do semaphore timeouts occur?

Semaphore timeouts can occur in a number of situations. For example, a task locks a semaphore and then does not release it (see possible reasons listed below), leaving tasks in the queue waiting for that semaphore to unlock. Not all semaphore timeout messages signal a serious problem, though they likely indicate an active, fully loaded or possibly overloaded system. Knowing these scenarios can help you differentiate between a real problem and a temporary situation.

Reasons for semaphore timeouts include the following:

  • A heavy load on the server, causing processes to delay releasing semaphores.
  • A semaphore deadlock – where two tasks are waiting on the resources the other uses and neither can break the loop. For example, Task A locks Semaphore 1 and then tries to lock Semaphore 2. In the meantime, Task B has already locked Semaphore 2 and is now trying to lock Semaphore 1. Task A is stuck waiting for Semaphore 2 and Task B is waiting for Semaphore 1.
  • A process that fails to release a semaphore during execution, which then blocks another process awaiting the semaphore.
  • A situation where poorly written scripts or programs are in contention for the same database at the same time. For example, in an end user environment, this type of situation can occur if a program is verified to work in a low-scale test configuration but encounters problems when rolled out to a much larger scale production environment.

You can check to see if the system has experienced a semaphore timeout in one of two ways.

  • Type Show Stat Sem.Timeouts at the Domino server console. If a timeout has occurred, you receive a display similar to the following:

    Sem.Timeouts= 430D:58 0A12:42 030B:28 0116:26 0A12:21

  • Check the Domino server console for a message that reads:

    Session semaphore held for [n] seconds

    Make sure to enable the DEBUG_SHOW_TIMEOUT setting in NOTES.INI for these messages to display. These messages/errors do not print to the server log file.

Reading a semaphore timeout message

Domino/Notes records the type and count of semaphore timeouts in output from the Show Stat Sem.Timeouts console command. You can view the actual semaphore timeout message on the console as well or in the SEMDEBUG.TXT file. See the "Troubleshooting – Collecting information" section below for details on enabling the appropriate NOTES.INI parameters for message display.

Learning how to decipher a semaphore timeout message typically narrows down the focus area of concern. We provide a sample semaphore timeout message below, followed by detail on how to analyze it. This message is an example and the text may differ from message to message.

      THREAD [009F:00BE -0012 ] WAITING FOR RWSEM 0x4245      open database queue semaphore (@01070246)      (R=0,W=1,WRITER=009F:00AF -0025,1STREADER=009F:00AF -0025) FOR 5000 ms

The bold characters in the above message reflect new information added from Domino R4 to R5.

Note: The terminology within this article supports the platforms that have the process and thread architecture. These platforms include Windows NT, UNIX, and OS/2. The process ID corresponds to the Domino Add-in task or Domino (Main) Server task. As many of the tasks are multi-threaded, one or more threads may have an association with a given process. For platforms that do not support threads (but do support processes), such as Netware and Windows 95 and 98, the handling is slightly different.

In the example, 0x4245 is the semaphore in question. The text following 0x4245 tersely describes what type of semaphore is being requested (in this case, open database queue semaphore). Part two of this article includes information on the activity with which specific semaphores are associated.

The table below analyzes specific identifiers in the sample semaphore message. See the "Troubleshooting – Reading information" section below for help corresponding thread IDs with specific tasks.

IdentifiersMeaning
[ 009F :00BE-0012] 009F is the process that requested the semaphore. More specifically, extract 009F and record that as the process ID. This ID represents the process that wants to lock the semaphore even though another process may own the semaphore.
[009F: 00BE -0012] 00BE is the virtual thread ID. More specifically, extract 00BE and record that as the virtual thread ID. This information is the value used internally within Domino to identify the executing thread.
[009F:00BE- 0012 ] 0012 is the thread ID that the platform operating system knows. This information is important to record for future debugging efforts and should be passed to Lotus Customer Support to help in troubleshooting. The physical thread information reporting is a result of the Domino R5 thread pool architecture enhancement.
009F :00AF 009F is also the process last locked the semaphore; AF is its virtual thread ID.

In the above example, the same process (009F) is trying to access the semaphore, but different threads are executing at different points in the logic. One thread has the semaphore and is blocking the other thread. Not all semaphore timeouts involve issues this complicated, but this example provides you with detail on one of the more challenging situations.

For more information on reading semaphore timeout messages, see the Sem.Timeouts Analysis sidebar and Lotus Customer Support Technote #112710, Semaphores and Semaphore Timeouts.

Troubleshooting – Collecting information

To analyze why semaphore timeout messages occur and determine their level of severity, you need to collect information. In general, you should have an understanding of how your system works at different levels of activity. This section offers suggestions on methods of gathering and analyzing relevant information.

Part two of this article supplies additional troubleshooting advice, based on real-life semaphore timeout scenarios derived from customer observations.

Semaphore messages are not written to the server log file, so taking steps to collect information is even more crucial to troubleshooting. The following strategies collect information from your system that will help you troubleshoot semaphore timeouts.

Know your system behavior at a steady state and active state time

Steady state is the time that users are not stressing the system (such as the start of the day, when users start up at different times). Active state is the time when users are logging on simultaneously and are most active (such as mid-morning and afternoon).

Once you know the system profile at steady state, you can move on to learning how the Domino server behaves at a more active state. Thus, when changes occur gradually or as a spike, you can discern how the impact upon steady or active state performance has changed. Know the differences, the valid ranges, and the potential for change (based on projected future database usage, new applications, and so on) for your Domino server.

In particular, gather such information from the Domino server console using the following commands and their output:

  • Show Stat Sem.Timeouts – Displays the total count and type of Sem.Timeouts messages that occurred since the start of the Domino server.
  • Show Stat Server.Users – Displays the count of active users at the current time.
  • Show Stat Database.DbCache.AverageDbOpenTime – Displays the average time in seconds it takes to open Domino databases when servicing internal and user requests.
  • Show Tasks – Displays the status and activity of active tasks and their server.
  • Show Perf – Displays the date, time, number of transactions executed, and number of currently active users on a continuous minute-by-minute basis.
  • Show Dbs – Displays performance information about a database, such as the number of times a database has been opened, whether the database has been modified, and the number of times a user has had to wait for a lock on the database. This command is new for R5.

Make use of various tools

The following tools can give you information on active processes and their activity level and physical thread ID values. You can use the process information from the semaphore message described in the previous section as an index to locate the process of interest from the tool output.

  • Perfmon – On the Windows NT platform, use Perfmon to help augment the information Domino Administrator collects for performance of specific system and Domino components. In particular, you can view the activity level at which Add-in tasks operated via the Process stats.
  • Windows NT Task Manager – This tool helps you identify the process ID for each task. To view IDs, click the Processes tab and match the process in the Image Name column to the process ID in the PID column. When you view the process ID in a semaphore timeout message, you can then associate it with a task. For more information, see Lotus Customer Support Technote #175384, How To Utilize SEMDEBUG.TXT and Task Manager.
  • Qslice – On the Windows NT platform, this tool can show the active tasks at any point in time. It exposes physical thread ID values under the associated process ID.
  • NSD – Developed by Domino engineers, this tool is currently available for the UNIX platform as part of the Domino server package. It shows the stack for the thread in question. For more information, see the Troubleshooting recommendations for UNIX sidebar.

For more information on other platform tools and on how to improve server performance, download the Lotusphere presentation "How to Deploy Domino R5 for Performance and Scalability" from the Lotus IT Central Performance Zone Technical Library.

Turn on NOTES.INI settings

Various NOTES.INI settings, when enabled, may provide additional insight into a problem situation. In particular, turn on the following:

NOTES.INI SettingDescription
DEBUG_SHOW_
TIMEOUT=1
Displays timeout messages on the server console.
DEBUG_CAPTURE_
TIMEOUT=1
Writes timeout messages to a SEMDEBUG.TXT file. These messages do not write to the server log file.
DEBUG_SEM_
TIMEOUT= n
Rate at which the Domino server console reports semaphore timeout messages in milliseconds. Default is 30000 milliseconds or 30 seconds. Do not change the default setting without contacting Lotus Customer Support.
DEBUG_THREADID=1Use to correlate messages from the DEBUG_OUTFILE with specific process and thread IDs.
LOG_UPDATE =2 LOG_VIEW_EVENTS=1Use to delve deeper into database and Indexer related issues.
NSF_BUFFER_POOL_
SIZE= value
Use only if your system is in a partition configuration. If not, accept the default setting (the underlying code makes a calculation based on a fixed overhead and total memory available). For information on calculating a setting for your partition system, see "Optimizing server performance: Port encryption and Buffer Pool settings" or the Lotusphere presentation Deploying Domino R5 for performance and scalability.
SERVER_MAX_
CONCURRENT_ TRANS= n
Use to achieve greater yield from Domino and to guarantee a certain level of activity when many users access the server simultaneously. Default is 20. This can also increase the semaphores available and enables the server to be more efficient when dealing with large bursts of activity. This applies only to Domino configurations that do not support the latest thread pool architecture enhancements (which are found in R5).
SERVER_SHOW_
PERFORMANCE=1
Enables additional information on the server console output, including the number of transactions executed in the past minute and the current number of server users.
DEBUG_OUTFILE= path file name Creates a file that captures server console output, which you can use to correlate messages with specific process and thread IDs. Use this file in conjunction with DEBUG_THREADID. This file does take up disk space and CPU to execute, so you should disable after use. For more information, see Lotus Customer Support Technote #162400, How Should the DEBUG_OUTFILE Parameter Be Implemented?

For more information on Domino server commands and NOTES.INI settings, see Domino 5 Administration Help.

Troubleshooting – Reading information

In many cases, you may have to deduce problems by eliminating certain causes. The following strategies offer advice on reading information to troubleshoot the cause of a semaphore timeout.

Read the server log file

A good place to start analyzing semaphore behavior is in the server log file, where output from console commands appears. In particular, the output from the commands Show Stat Server.Users and Show Stat Sem.Timeouts provide insight into activity that has changed over a given time period.

To determine trend patterns, analyze the server log file on a daily and weekly basis. When analyzing a situation that needs closer monitoring, you may want to capture the Domino server console information within a one to four hour range. This information makes you better prepared to correlate output from Server.Users and Sem.Timeouts console commands to output from the log file. Also, remain aware of how the values change as usage on the Domino server increases. Note that even though semaphore timeout messages do not appear in the server log file, the total count of Sem.Timeouts increases with each semaphore message displayed on the console.

View output from LOG_UPDATE and LOG_VIEW_EVENTS

In the server log file, view the output from the LOG_UPDATE and LOG_VIEW_EVENTS commands when you believe the issue is database or Indexer related.

The LOG_UPDATE command tells you about indexing activity that the Domino Update task performs, providing the name of the database, the name of the view being updated, and status information. If the LOG_UPDATE command is set, and the messages regarding the updating views display quickly on the Domino server console, the views are current (that is, the messages display as each view is checked for recent activity).

The output from the LOG_VIEW_EVENTS command tells you the reason for activity, such as rebuilding a view, which the Update (Indexer) task also performs. For example, you can see when and which views are being rebuilt in the more popular databases such as the Domino Directory or Mail.Box. Also, you can start to put some of these clues together. For example, if you notice that a given view is frequently listed as being rebuilt, you might want to check into whether available disk space is a limiting issue. If there is not enough disk space, the view rebuild process never completes successfully, and the Domino Administrator notices that the system keeps trying to rebuild the views.

View output from DEBUG_THREADID

In the server log file, view the output from DEBUG_THREADID to correlate the DEBUG_OUTFILE file messages with a process ID or a thread ID. You need to cross-reference DEBUG_THREADID output with your OS listing of processes and their associated IDs. See the "Troubleshooting – Collecting information" section above for more information on process IDs. We recommend that you contact Lotus Customer Support when using DEBUG_THREADID as they can guide you through the analysis.

Read Database.DbCache.AverageDbOpenTime information

This statistic keeps track of the average time, in seconds, it takes to open Domino databases when servicing internal and end user requests. You should review this value on a continuous basis, noting its trends over time (if your system typically assumes a consistent behavior over time). This value can serve as an indication of system problems, including system overload. Always keep in mind that one statistic does not provide you with an overall picture of the health of the server. You should review other statistics as well for a more complete picture.

View activity from Domino Add-in tasks

When analyzing the semaphore timeout changes in synchronization with the server log output, you can also see activity the other Domino Add-in tasks performed. These tasks, on their own or in conjunction with another Domino component, may also have a role in the semaphore timeout.

Reviewing the comment field displayed next to the various Add-in tasks (viewed through output from Show Tasks command) gives you additional information about what the task is doing. You can also monitor if a task is stuck, which is the case when a status message doesn't change over a reasonable period of time. Some exceptions, such as when a very large view is being rebuilt, may occur here. Again, your familiarity with the system and its behavior can help you spot a true problem situation.

View Domino server user count

When reviewing messages from the Domino server console or from the server log, view the Domino server user count along with the other reported transactions. To turn on display of the user count, first make sure the NOTES.INI setting SERVER_SHOW_PERFORMANCE is set, or issue the Domino server console command Show Perf. From then on, the Domino server user count automatically appears on the server console and becomes part of the log. This information display occurs on a continuous minute-by-minute basis. It contains the date, time, number of transactions executed over the past minute, and number of currently active users.

Analyze Sem.Timeouts information

The Sem.Timeouts output provides a report of the type and the count of semaphore timeouts. The output line can list more than one semaphore timeout type. A large jump in a timeout count over the monitoring period provides good reason to investigate further. As a gauge, an increase in a given semaphore count by approximately 5 to 10 in an hour is an indication that an area needs further investigation.

With the enhanced output in SEMDEBUG.TXT, STATREP.NSF (where statistics are captured), and the server log file, you can isolate the approximate time when there was a higher rate of timeouts for a particular semaphore. Also, try to determine in what area of the product the semaphore timeout occurred (such as, a task within the Domino server attempts to execute down a given path and is blocked). See the Sem.Timeouts Analysis sidebar and Lotus Customer Support Technote #112710, Semaphores and Semaphore Timeouts, to understand Sem.Timeouts information, particularly how it keeps track of which timeouts occurred and their frequency.

Note that you cannot re-initialize Sem.Timeouts stats, so any change or analysis occurs in terms of the first value captured.

Analyze semaphore timeout messages

The actual semaphore timeout messages are written to the SEMDEBUG.TXT file when you have enabled the NOTES.INI setting DEBUG_CAPTURE_TIMEOUT. Using one of the platform tools (for example, the Windows NT Task Manager) to help you translate the information reported in these messages, you can learn more about the semaphore under contention, including the tasks involved. See the "Reading a semaphore timeout message" section earlier in this article for help in deciphering a message.

Eliminate extraneous Add-in tasks

If possible, eliminate any extraneous Domino server Add-in tasks, either internal or external (such as an end-user-developed application or program) to Domino. This action helps to minimize the server load and to reduce opportunities for contention, especially in cases where the semaphore messages act as a warning for an overloaded system.

You may want to treat the removal of externally developed Add-in tasks differently than those developed internally. Sometimes externally developed tasks interfere or conflict with the overall processing of Domino Add-in tasks. You can remove tasks temporarily to see if the system experiences any changes. If the semaphore messages disappear, appear at a different rate, or if different messages appear, you now have additional insight as to the cause and effect.

Review NSF_BUFFER_POOL_SIZE

We recommend accepting the default value of the NOTES.INI setting NSF_BUFFER_POOL_SIZE. If you do specify a setting, you should review this setting after upgrading the system to a new Domino release or when system configuration changes. This guideline of accepting the default specification applies to single partition configurations.

Check SERVER_MAX_CONCURRENT_TRANS

Look to see if this NOTES.INI setting is too high. The default for this setting is 20, and we recommend increasing it in increments of 20. If Notes clients are timing out, then this value is set too high. Part two of this article contains more information on this setting.

Consider changing the DEBUG_SEM_TIMEOUT default

Before you change this NOTES.INI setting, contact Lotus Support for assistance. DEBUG_SEM_TIMEOUT controls the rate in milliseconds at which the semaphore blockages are reported and enables an administrator to be told, at a given time threshold, how long a semaphore request is blocked. For your environment, there might be reason to change from the default from 30000 (30 seconds).

We recommend the next reasonable value as 60000 (60 seconds), but you should base final adjustment on your understanding of the server's behavior and what you want to accomplish. For example, you may want to adjust the default value for DEBUG_SEM_TIMEOUT to something else when running an overused server, because an overused server is more prone to displaying semaphore timeout messages. And you presumably already have the expectation to see semaphore timeout messages (as the messages act as warning about system overuse) based on the workload, so increasing the time interval minimizes the acknowledged situation.

Conclusion

Again, it is worth restating that the presence of semaphore timeouts are not always a bad thing for the system. In fact, a healthy system is likely to encounter semaphore timeouts too. Learning about semaphore notifications and how they occur can help you maintain a reliable and efficient system.

In this article, you have learned what semaphores are, why they produce timeouts, and how you can check if your system has encountered any. You also know how to read timeout messages and how to troubleshoot the cause of the timeout messages. Part two of this article provides additional troubleshooting advice, this time based on real-life scenarios. The second part of this article also gives tips on minimizing semaphore issues in your system.