FairCom — Troubleshooting and Debugging

In This Chapter

Failures During FairCom DB Startup

This section discusses failures that may occur when starting the FairCom DB.

In This Section

Server Fails to Start

Workaround - Failed to Bind SHMEM

Server Startup Hangs or Takes Excessive Time

Java 1.7 uses large amount of virtual memory at startup

Server Fails to Start

There are situations in which an attempt to start a FairCom DB process can fail. A failed server startup can be detected by examining the list of processes running on the system. If ctsrvr (the name of the server binary) is not shown as a running process after the server is started, the server startup has failed.

The ps or pgrep utilities can be used on Unix systems to list active processes.

The Windows Task Manager can be used on Windows system to list active processes.

In the event of a failed FairCom DB startup, the server logs error messages to CTSTATUS.FCS and sometimes to the server console. Check for errors in these locations first in order to understand the reason the server failed to start. If FairCom DB has successfully started in the past, consider what might have changed since the last successful server startup (such as server configuration file options, server binaries, etc.).

The following are possible causes of a failed server startup:

In This Section

Unactivated c-tree Server (DEPRECATED SUPPORT)

Missing or Incorrect Configuration File

Unrecognized Keyword in Server Configuration File

Server Fails to Open Server Administrative Files

Missing Server Binary or Communication DLLs

Server Cannot Initialize Communication Protocol

Missing or Corrupt Server Settings File

Automatic Recovery Fails

A Server is Already Running in the Working Directory

Dynamic Dump Cannot Be Scheduled

Server Startup Terminates Abnormally

Unactivated c-tree Server (DEPRECATED SUPPORT)

If FairCom DB (c-tree server versions V9 and prior) is not pre-activated by FairCom, it must be activated before it is started. If an unactivated server is started, the server writes the following message to CTSTATUS.FCS and terminates:

Thu Sep 25 15:42:38 2003
- User# 01SERVER NEEDS ACTIVATION KEY!
Execute 'fcactvat' program first

To resolve this type of failed startup, activate the c-tree Server using the c tree Server Activation Utility. (DEPRECATED: only applies to c-tree Server versions 9 and prior)

Missing or Incorrect Configuration File

The server may fail to start if the server configuration file is missing or contains settings that are inconsistent with those used previously when starting FairCom DB. For example, if the server configuration file specifies a PAGE_SIZE setting that differs from the setting used when the server created its FAIRCOM.FCS file, the server is unable to open FAIRCOM.FCS and server startup fails. In this situation, the server logs error details to CTSTATUS.FCS.

To resolve this type of failed startup, review the startup errors logged to CTSTATUS.FCS and start the server using a server configuration file with the appropriate configuration settings.

Unrecognized Keyword in Server Configuration File

If the server configuration file used when starting FairCom DB contains a keyword that is used incorrectly or is not recognized, FairCom DB fails to start up. The example below shows the messages logged to CTSTATUS.FCS when an unrecognized keyword <unrecognized_keyword> is specified in the server configuration file:

Thu Sep 25 16:45:06 2003
 - User# 01DO NOT RECOGNIZE CONFIGURATION KEYWORD...: 2
Thu Sep 25 16:45:06 2003
 - User# 01<unrecognized_keyword>: 2
Thu Sep 25 16:45:06 2003
 - User# 01O1 M2 L73 F9 P0x (recur #1) (uerr_cod=0)

To resolve this type of failed startup, review the errors logged to CTSTATUS.FCS and make the appropriate changes to the server configuration file.

Server Fails to Open Server Administrative Files

FairCom DB will fail to start if it is unable to open its administrative files, which include the files FAIRCOM.FCS, SYSLOGDT.FCS, and SYSLOGIX.FCS. In this situation, check the server status log and look for a message such as the following:

Wed Oct 1 12:34:25 2003
- User# 01 Could not initialize server. Error: 14

If the server fails to start up and logs this message to the server status log, attempt to open FAIRCOM.FCS, SYSLOGDT.FCS, and SYSLOGIX.FCS to determine which of these files failed to open. The files SYSLOGDT.FCS and SYSLOGIX.FCS are the server’s event logs. If the server fails to open the files they can be safely deleted if their contents are not of interest, or they can be rebuilt, or they can be moved out of the server directory and the server will re-create these files during server startup. The file FAIRCOM.FCS contains user and group definitions. If the sever fails to open the files and the server’s default user and group account settings have not been changed, FAIRCOM.FCS can be deleted and the server will re-create it during server startup. If the server administrator has made changes to the user and group accounts settings, rebuild this file or restore it from backup.

If the message shown below appears in the server status log, the server was not able to open FAIRCOM.FCS because the current PAGE_SIZE setting differs from the page size used when FAIRCOM.FCS was created.

Wed Oct 01 12:46:38 2003
- User# 01 Could not process User/Group Information: 417

To correct this problem, change the PAGE_SIZE setting to the correct value, use the ctscmp utility to change the page size for FAIRCOM.FCS, or delete FAIRCOM.FCS.

If the server fails to start due to a failure to open its administrative files, consider adding DIAGNOSTICS LOWL_FILE_IO to the server configuration file. This diagnostic option causes the server to log messages showing filenames and system error codes for failed file open/close/delete/create/rename operations to the server status log.

Missing Server Binary or Communication DLLs

The c tree Server can fail to start if the server binary is missing or in the case of the Windows server if a required communication DLL is missing. If the server binary is missing, the system command used to start the server typically outputs a message indicating that the binary was not found. In the event of a missing communication DLL, CTSTATUS.FCS shows a message such as the following:

Thu Sep 25 15:15:58 2003
 - User# 01     F_TCPIP: 145
Thu Sep 25 15:15:58 2003
 - User# 01     Could not establish logon area. Error: 143

To resolve this type of failed startup, review the startup errors reported by the server startup command or logged to CTSTATUS.FCS and copy the necessary binaries to the server’s working directory.

Server Cannot Initialize Communication Protocol

If FairCom DB is unable to initialize its communication subsystem at startup, it will terminate with an error. Examples of this type of failure include a missing communication DLL (as discussed above), improper system configuration for the specified communication protocol, or unavailable communication resources (for example, the TCP/IP port the server is attempting to use is already in use or otherwise unavailable).

To resolve this type of failed startup, review error messages logged to CTSTATUS.FCS to determine the reason for the failure. Correct the problem and restart the server.

Missing or Corrupt Server Settings File

Some versions of FairCom DB are require an encrypted server settings file to exist at startup. A server that requires a settings file will terminate at startup if the settings file is missing or if the server cannot read the settings file. In this situation, the server writes one of the following messages to CTSTATUS.FCS:

Thu Sep 25 16:41:10 2003
- User# 01     The Server's settings file is missing.
It is required to operate this server.
Thu Sep 25 16:42:01 2003
- User# 01     The current Server's settings file is invalid.
Use current FairCom utility to recreate the settings file.

To resolve this type of failed startup, review error messages logged to CTSTATUS.FCS. Make available to the server a good copy of the encrypted settings file and restart the server.

Automatic Recovery Fails

Each time FairCom DB starts, it examines its transaction logs to determine whether or not it needs to perform automatic recovery of TRNLOG files. The server automatically performs automatic recovery if it determines automatic recovery is required. When automatic recovery is successful, the server continues its normal startup processing. If automatic recovery fails, FairCom DB logs an error message to CTSTATUS.FCS and terminates.

To resolve this type of failed startup, review the error messages logged to CTSTATUS.FCS and take the appropriate action. See the "Automatic Recovery Fails" section in this chapter for a discussion of specific types of automatic recovery failures and what to do in each case.

A Server is Already Running in the Working Directory

Two FairCom DB instances cannot be running in the same working directory at the same time.

Note: Here the working directory refers to the directory in which the server stores its transaction logs and other *.FCS files.

If a server attempts to start in a directory in which another server is already running, the server fails to start up and logs the following error message to CTSTATUS.FCS:

Thu Sep 25 16:38:15 2003
 - User# 01     Is another server running in this workspace?
Thu Sep 25 16:38:15 2003
 - User# 01     O1 M99 L54 F537 P1x (recur #1) (uerr_cod=0)

To resolve this type of failed startup, shut down the running server or configure FairCom DB to start in a different working directory, then start FairCom DB.

Dynamic Dump Cannot Be Scheduled

If the server configuration file includes the DUMP keyword, the server opens the specified dynamic dump script file and attempts to schedule a dynamic dump. If the dump cannot be scheduled (for example, due to a missing dump script or a dump script with incorrect or unrecognized keywords), the server logs an error message to CTSTATUS.FCS and shuts down. Below is an example showing errors logged to CTSTATUS.FCS when a dynamic dump cannot be scheduled at server startup because the dump script file does not exist:

Thu Sep 25 16:48:43 2003
 - User# 01     DD: could not open script file...: 12
Thu Sep 25 16:48:43 2003
 - User# 01     my.scr: 12
Thu Sep 25 16:48:43 2003
 - User# 01     Could not schedule Dynamic Dump...: 5
Thu Sep 25 16:48:43 2003
 - User# 01     my.scr: 5

To resolve this type of failed startup, review the error messages logged to CTSTATUS.FCS to understand the specific cause of the failure. Correct the problem that prevented the scheduling of the dynamic dump and restart the server.

Server Startup Terminates Abnormally

In addition to the specific causes of a failed server startup, FairCom DB may terminate abnormally at startup for the following reasons:

FairCom DB process encounters a fatal exception, causing the system to terminate the server process. In this case the system may produce a core image of the server process at the time of the exception. The FairCom DB status log may contain error messages related to the exception.
An administrator forcibly terminates FairCom DB process. In this case, the process is abruptly terminated and the FairCom DB status log does not show an indication of a shutdown occurring.
The system on which FairCom DB is running terminates abnormally (due to power loss, operating system exception, or sudden system reboot). In this case, the server process may be abruptly terminated, which case the status log does not show an indication of a shutdown occurring.
FairCom DB detects an unexpected internal server error situation known as a catend or a terr. In these cases, the server status log shows an error message containing details about the internal server error.

If a server startup terminates abnormally, follow these steps:

Examine system logs, application logs, and the server status log to determine the nature of the abnormal server termination.
1. If a fatal exception terminated the server process, save the core file if it exists.
2. If the server terminated due to a fatal exception or internal FairCom DB error, save a copy of the server’s *.FCS files, the server configuration file, and if time and disk space permit save a copy of all data and index files before restarting FairCom DB. These files can be used to analyze the abnormal server termination.
3. Consider whether any recent hardware or software changes could explain the reason for the startup failure (including server configuration file option changes).
4. If the situation that led to the abnormal server termination can be understood by analyzing the server status log or other system logs, correct the problem that caused the server to terminate. For example, if the server terminated due to insufficient disk space which prevented the server from writing to its transaction logs, free up disk space to ensure the server has enough space for the transaction logs (Caution - but do not delete active transaction logs before the server performs its automatic recovery).
Unlike an abnormal server termination that occurs after the server is operational, an abnormal server termination at server startup does not require special actions to be taken to ensure the integrity of PREIMG and non-transaction FairCom DB data and index files because the application data and index files will not have been open with active changes.
If the abnormal server termination occurred during automatic recovery, the TRNLOG files are in an unknown state. Automatic recovery must be successfully completed or the TRNLOG files re-created or restored from backup. See the “Automatic Recovery Fails” topic in the “Failures During System Recovery” section of this document for details on the steps to follow if the server terminates abnormally during automatic recovery.
If the server can be restarted and automatic recovery completes successfully, clients can connect to the server and can resume processing.

Workaround - Failed to Bind SHMEM

Sometimes, when starting up the database, the following error(s) may be triggered:

FSHAREMM: Failed to bind/listen on Unix domain socket for shared memory: 2
NewUser: Unable to create an instance of a named pipe

A workaround for this issue is to switch the environmental variable that is assigned at startup to use full paths in the following areas:

temp
shared memory
data

Note: ctsrvr.cfg by default uses relative paths.

Switching the environmental variable to use full paths in the above areas should fix the issue.

Server Startup Hangs or Takes Excessive Time

The FairCom DB startup process is usually very fast. FairCom DB logs a message of the form shown below to CTSTATUS.FCS when startup is complete and the server is ready for clients to connect:

- User# 00001 FairCom DB V12.0.0.111 SQL Server Is Operational -SN 39001664

If FairCom DB process appears in the system process list but CTSTATUS.FCS does not yet show the above message indicating the server is operational, the most likely cause is that the server is performing automatic recovery and for some reason the recovery is taking a long time.

To determine if the long startup time is due to automatic recovery occurring, check the server status log. When the server begins automatic recovery, it writes the following message to the status log:

- User# 01 Beginning automatic recovery process

When the server completes automatic recovery, it writes the following message to the status log:

- User# 01 Automatic recovery completed

If the server has logged the first message to the status log but not the second, it has begun but has not yet completed automatic recovery.

If RECOVER_DETAILS YES is present in the server configuration file when the server is started, the server logs the time spent for each phase of recovery to the server status log. This option provides a way to more specifically monitor the progress of automatic recovery. Below is an example showing the order of automatic recovery phases. The server writes the entry for each recovery phase upon completion of that recovery phase, so the current recovery phase can be determined by examining the contents of the server status log.

 Mon Sep 29 10:14:34 2003
 - User# 01     Transaction scan time:    1 seconds.
Mon Sep 29 10:14:34 2003
 - User# 01     Beginning automatic recovery process
Mon Sep 29 10:14:34 2003
 - User# 01     Index repair time:        0 seconds.
Mon Sep 29 10:14:34 2003
 - User# 01     Index composition time:   0 seconds.
Mon Sep 29 10:14:34 2003
 - User# 01     Transaction undo time:    0 seconds for   2 transactions.
Mon Sep 29 10:14:34 2003
 - User# 01     Vulnerable data time:     0 seconds.
Mon Sep 29 10:14:34 2003
 - User# 01     Vulnerable index time:    0 seconds.
Mon Sep 29 10:14:34 2003
 - User# 01     Transaction redo time:    0 seconds for 843 transactions.
Mon Sep 29 10:14:34 2003
 - User# 01     Automatic recovery completed

Although the current recovery phase and time spent so far during recovery can be determined using the above approach, the log entries do not indicate how much time remains until recovery is complete.

If automatic recovery appears to hang or is taking a long time, the two options are to wait until recovery completes or to abandon recovery and restore TRNLOG data and index files from a backup. See the "Automatic Recovery Fails" topic in the “Failures During System Recovery” section of this document for details on these options.

If the server startup is taking a long time or appears to hang and the cause does not appear to be automatic recovery (based on the status log messages), check the state of the server process using system utilities as described in the “Monitoring FairCom DB Process State” section. If necessary, the server can be forcibly terminated and restarted without affecting the state of FairCom DB data and index files because the files will not have been open for end user modification at this point.

Java 1.7 uses large amount of virtual memory at startup

The Server will potentially use a very large amount of virtual memory at startup when it is configured to use Java version 1.7. This is because the default behavior of the 1.7 JVM (64-bit JVM on Windows) is to reserve 1/4 of the total physical memory. This does not affect physical memory usage. When looking at the Windows Committed memory or Virtual memory usage, ctreesql may show a very large value at startup if the JVM is configured.

http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html

The JVM heap size can be limited with the following Server configuration keyword:


; Limit JVM maximum heap to 256 MB
SETENV DH_JVM_OPTION_STRINGS=-Xmx256m

The maximum heap size should be tuned based on how the Stored procedures are written and used (256MB should be enough for simple usage). See below for details on Java Tuning.

http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html

Failures During FairCom DB Operation

This section discusses failures that may occur during FairCom DB operation following a successful startup.

In This Section

Clients Cannot Connect to Server

Clients Lose Connection to Server

Number of Active Transaction Logs Unexpectedly Increases

Server Is in a Non-Responsive State

Some Clients Are In A Non-Responsive State

Errors Occur When Opening FairCom DB Files

Errors Occur When Reading or Writing FairCom DB Files

c-tree API Call Fails With Unexpected Error

Server Writes Unexpected Messages to Status Log

Server Exhibits Atypical Performance

Server Exhibits Unexpected Resource Usage

Dynamic Dump Fails

Data or Index File Sizes Grow Unexpectedly

Server Terminates Abnormally

Clients Cannot Connect to Server

To connect to FairCom DB, a client calls a FairCom DB API function such as InitCtree(), InitCtreeXtd(), InitISAM(), InitISAMXtd(), or CTDBLogon(). A connection attempt may fail for various reasons. Check the function return code to determine possible causes for the connection failure. The following table sections lists errors a FairCom DB API function may return in the event of a failed connection attempt and describes possible causes and troubleshooting steps for each:

In This Section

FairCom DB Error 10: SPAC_ERR

FairCom DB Error 84: MUSR_ERR

FairCom DB Error 127: ARQS_ERR

FairCom DB Error 128: ARSP_ERR

FairCom DB Error 133: ASKY_ERR

FairCom DB Error 150: SHUT_ERR

FairCom DB Error 162: SGON_ERR

FairCom DB Error 450: LUID_ERR

FairCom DB Error 451: LPWD_ERR

FairCom DB Error 452: LSRV_ERR

FairCom DB Error 470: LGST_ERR

FairCom DB Error 530: LMTC_ERR

FairCom DB Error 579: LIVL_ERR

FairCom DB Error 584: LRSM_ERR

FairCom DB Error 585: LVAL_ERR

FairCom DB Error 589: LADM_ERR

FairCom DB Error 593: XUSR_ERR

FairCom DB Error 609: LTPW_ERR

FairCom DB Error 10: SPAC_ERR

Error description

Memory allocation error during logon.

Possible causes

An attempt to allocate memory during logon failed. The most common cause is a shortage of system memory.

Troubleshooting steps

Check system memory usage by FairCom DB and other processes on the system. Free system memory (for example by stopping unnecessary processes or shutting down and restarting FairCom DB) and attempt to logon again.

FairCom DB Error 84: MUSR_ERR

Error description

Maximum users exceeded.

Possible causes

FairCom DB enforces a limit on the number of concurrently connected clients. This error indicates that the maximum number of concurrent connections has been reached.

Troubleshooting steps

In some cases, it is expected that the maximum number of clients may be logged on. In this situation, try the operation again after clients have logged off. If reaching the connected user limit is unexpected, use the ctadmn utility to list the current client connections and to view their activity. Use ctadmn to determine if there are any inactive client connections that can be terminated. When the number of connected clients is below the connection limit, try to logon again. Note, the Server is designed to allow one instance of the user ID “ADMIN” to be able to connect to the Server even if the maximum number of supported connections has been reached.

Note: A client which belongs to the ADMIN group and which sets the USERPRF_ADMSPCL bit of the user profile can log onto to FairCom DB even if the server already has the maximum permitted number of clients logged on. This capability is useful in situations in which it is necessary to perform system administration even when the client limit has been reached. The ctadmn utility automatically uses this feature.

FairCom DB Error 127: ARQS_ERR

Error description:
Could not send request. A communication error occurred while sending a request to the server.

Possible causes and troubleshooting steps:
The section "Clients Lose Connection to Server" describes possible causes and troubleshooting steps for this error.

FairCom DB Error 128: ARSP_ERR

Error description:
Could not receive answer. A communication error occurred while receiving a response from the server.

Possible causes and troubleshooting steps:
The section "Clients Lose Connection to Server" describes possible causes and troubleshooting steps for this error.

FairCom DB Error 133: ASKY_ERR

Error description:
The server could not be located.

Possible causes:

FairCom DB is not operational.
The specified server name or location is incorrect.
Network connectivity problems prevent the client from connecting to the server.
The client is using a different communication protocol than the server is using.

Troubleshooting steps:

Verify that FairCom DB is operational. Use system utilities to confirm that the server process is active.
Confirm that the server is using the same communication protocol as the client. This can be determined by examining the server configuration file (ctsrvr.cfg) and the server status log (CTSTATUS.FCS).
Verify that the server name and location are correctly specified for the communication protocol the client is using. Note the SERVER_NAME value is case sensitive. If the TCP/IP communication protocol is being used, be sure the Machine_Name@Server_NameServer_Name@Machine_Name protocol is being followed. For example, 128.128.128.128@FAIRCOMSFAIRCOMS@128.128.128.128.
Check the network connectivity. If the server and client reside on separate machines, ping the server machine from the client machine using the host name or IP address specified when connecting. Use system utilities to verify that the server is listening for incoming connections. Try connecting using ctadmn from the server machine and from the client machine.
Add the option DIAGNOSTICS LOGON_COMM to the FairCom DB configuration file and restarting the server. This option causes the server to write detailed logon status messages to its console. These messages can help determine at what point the connection attempt failed. For example, the messages can show whether or not the server was aware of the client’s connection attempt. Note: tracking of logon details on the client side can be enabled by compiling the FairCom DB client library with #define CT_DIAGNOSE_LOGON_COMM enabled. The client outputs logon messages to standard output.
Shut down and restart FairCom DB to cause it to reinitialize its communication subsystem.

FairCom DB Error 150: SHUT_ERR

Error description:
Server is shutting down. The client connected to the server but the server closed the connection.

Possible causes and troubleshooting steps:
The section "Clients Lose Connection to Server" describes possible causes and troubleshooting steps for this error.

FairCom DB Error 162: SGON_ERR

Error description:
Server has gone away. The client connected to the server but the server closed the connection.

Possible causes and troubleshooting steps:
The section "Clients Lose Connection to Server" describes possible causes and troubleshooting steps for this error.

FairCom DB Error 450: LUID_ERR

Error description:
Invalid user ID.

Possible causes:
The specified user ID does not exist.

Troubleshooting steps:

Use the ctadmn utility to list the user IDs recognized by the server.
Logon using a valid user ID or create a new user account with the specified user ID.

FairCom DB Error 451: LPWD_ERR

Error description:
Invalid password.

Possible causes:
The specified password is invalid for the specified user ID. Note that passwords are case-sensitive.

Troubleshooting steps:
To resolve this error, logon using the correct password for the specified user ID or use the ctadmn utility to change the password to match the specified password.

FairCom DB Error 452: LSRV_ERR

Error description:
Server could not process user or account information. FairCom DB encountered an error when reading the account information for the specified user.

Possible causes:
This logon error points to possible problems with the user account data in the FAIRCOM.FCS superfile. Depending on the location of the problem, the c tree Server may log one of the following messages to CTSTATUS.FCS:

User ID file processing error
User/Group file processing error
Valid ID file processing error

Troubleshooting steps:
Use the ctadmn utility to list the properties for the specified user account. If necessary, change account properties or delete and re-create the account. If the error persists, using the ctscmp utility to compact FAIRCOM.FCS may help correct this type of error. Or, if a good backup copy of FAIRCOM.FCS exists, restoring this file from backup is an option.

FairCom DB Error 470: LGST_ERR

Error description:
Guest logons disabled.

Possible causes:
The client attempted to logon as guest by specifying a NULL or empty user ID, but guest logons are disabled. The guest account is disabled unless GUEST_LOGON YES is specified in a server settings or configuration file (the guest account is also disabled by specifying GUEST_LOGON NO).

Troubleshooting steps:
To avoid this error, logon using a non-guest account or reconfigure FairCom DB to allow guest access.

FairCom DB Error 530: LMTC_ERR

Error description:
Client does not match server.

Possible causes:
Some FairCom DB instances only accept connections by specially-configured FairCom DB clients. When a standard FairCom DB client attempts to connect to such a server, the server rejects the connection attempt with FairCom DB error 530.

Troubleshooting steps:
To avoid this error, connect to the server using the appropriate specially-configured FairCom DB client.

FairCom DB Error 579: LIVL_ERR

Error description:
Logon interval error. FairCom DB can be configured to require a user to logon at least once within a defined time. This ability can be configured on a per-user account basis and on a system-wide basis if desired. If the user fails to logon at least once within the prescribed time period, FairCom DB disables the user account and subsequent logon attempts using that user ID fail with FairCom DB error 579.

Possible causes:
A required logon time period is in effect for the specified user account and the user did not logon at least once within the specified time period.

Troubleshooting steps:
An administrator can re-enable a user account that FairCom DB has deactivated due to exceeding the required logon interval by using the ctadmn utility.

FairCom DB Error 584: LRSM_ERR

Error description:
Exceeded failed logon limit. FairCom DB supports setting a limit on the number of consecutive logon failures due to specifying an invalid password. This limit can be set on a per-user and on a system-wide basis. If a user exceeds the specified number of consecutive logon failures, the server disables the user account and subsequent logon attempts using that user ID fail with FairCom DB error 584.

Possible causes:
A consecutive logon failure limit is in effect for the specified user account and the user has exceeded the limit.

Troubleshooting steps:
An administrator can re-enable a user account that FairCom DB has deactivated due to exceeding the failed logon limit by using the ctadmn utility.

FairCom DB Error 585: LVAL_ERR

Error description:
Logon date exception. FairCom DB supports setting starting and ending validity dates for user accounts. An attempt to logon using a user account before its starting validity date or after its ending validity date fails with c tree error 585.

Possible causes:
A validity period is in effect for the specified user account and the starting validity date has not yet arrived, or the ending validity date has already passed.

Troubleshooting steps:
An administrator can change the starting and ending validity dates for a user account using the ctadmn utility.

FairCom DB Error 589: LADM_ERR

Error description:
Member of ADMIN group required.

Possible causes:
A client is attempting to connect to FairCom DB in an administrative mode (for example, logging on using ctadmn or logging on in order to shut down the server) but the specified user ID is not a member of the ADMIN group.

Troubleshooting steps:
Logon using a user ID that is a member of the ADMIN group, or use the ctadmn utility to add the specified user ID to the ADMIN group.

FairCom DB Error 593: XUSR_ERR

Error description:
Non-ADMIN user blocked from logon. FairCom DB can be put into an operational mode in which only users that are members of the ADMIN group are allowed to logon. When this mode is in effect, logon attempts by non-ADMIN users are rejected with FairCom DB error 593.

Possible causes:
The server administrator has enabled FairCom DB’s ADMIN group only logon mode using the FairCom DB SECURITY API function or using the STARTUP_BLOCK_LOGONS keyword in the server configuration file.

Troubleshooting steps:
To resume normal server operation, the server administrator can call the c tree Plus SECURITY API function with a mode of SEC_BLOCK_OFF.

FairCom DB Error 609: LTPW_ERR

Error description:
One-use temporary password failure. FairCom DB supports establishing a one-time password that another client can use (along with the user ID matching the caller that set the password) as long as the original caller is still logged on. Once the original caller logs off, the one-time password becomes invalid. Once the password is used by the subsequent client, the password is no longer available.

Possible causes:
Either no one-time password is in effect for the specified user ID, or the specified password is invalid.

Troubleshooting steps:
Confirm that the client that established the one-time password is already logged on, that a client has not yet connected using the one-time password, and that the specified password is correct.

Clients Lose Connection to Server

A connected client may lose its connection to FairCom DB for various reasons. If a client loses its connection to the server, subsequent calls to FairCom DB functions that send requests to the server return an error indicating that the connection has been lost. The topics below list errors a FairCom DB API function may return in the event of a lost connection and describes possible causes and troubleshooting steps for each:

In This Section

FairCom DB Error 7: TUSR_ERR

FairCom DB Error 127: ARQS_ERR

FairCom DB Error 128: ARSP_ERR

FairCom DB Error 150: SHUT_ERR

FairCom DB Error 162: SGON_ERR

FairCom DB Error 7: TUSR_ERR

Error description:
Terminate user. FairCom DB or server administrator has terminated the client’s connection to the server.

Possible causes:

An administrator may have terminated the client connection.
FairCom DB may have terminated the client connection (for example, when the server is shutting down or when the server aborts a transaction).

Troubleshooting steps:
Examine CTSTATUS.FCS to determine the reason the connection was terminated. See the discussion of FairCom DB error 127 for additional details about reconnecting to FairCom DB after a client connection is terminated.

FairCom DB Error 127: ARQS_ERR

Error description:
Could not send request. A communication error occurred while sending a request to the server.

Possible causes:

The server may have shut down.
The server may have terminated abnormally.
An administrator may have terminated the client connection.
A network error may have occurred that terminated the connection to the server.

Troubleshooting steps:

Verify that the server is operational. Restart the server if it is not operational.
When a client loses its connection to FairCom DB and the server detects the lost connection, the server aborts the client’s active transaction (if any) and closes any files that the client had open. If the server is still operational, use the ctadmn utility to confirm that the server thread for the lost connection has been terminated. If necessary, terminate the old connection using ctadmn.
To determine if the connection was terminated by an administrator, check CTSTATUS.FCS for messages of the form:

   External kill request posted against user #<taskID>

where <taskID> is a server-assigned thread task ID.

Attempt to reconnect to the server.
If this communication error occurs at logon time, consider using the DIAGNOSTICS LOGON_COMM server configuration keyword to monitor logon process as described in the discussion of FairCom DB error 133.
If this communication error occurs frequently without explanation (for example, connections are lost but FairCom DB remains active), investigate the possibility of network connectivity errors.

FairCom DB Error 128: ARSP_ERR

Error description:
Could not receive answer. A communication error occurred while receiving a response from the server.

Possible causes and troubleshooting steps:
The possible causes and troubleshooting steps are the same as for error 127.

FairCom DB Error 150: SHUT_ERR

Error description:
Server is shutting down. The client connected to the server but the server closed the connection.

Possible causes:
FairCom DB is refusing new connections because it is in the process of shutting down.

Troubleshooting steps:
Check the status of FairCom DB process. Restart the c tree Server if necessary. When the server is operational, retry the connection attempt.

FairCom DB Error 162: SGON_ERR

Error description:
Server has gone away. The client connected to the server but the server closed the connection.

Possible causes and troubleshooting steps:
The possible causes and troubleshooting steps for this error are the same as described for FairCom DB error 150.

Number of Active Transaction Logs Unexpectedly Increases

The number of active transaction log files required to support FairCom DB operation is nominally normally 4. Each time the server creates a new log, it determines whether or not the oldest existing log can be deleted (or made inactive, if the KEEP_LOGS server configuration option is specified in the server configuration file). A number of conditions require the server to keep more than 4 active log files. It is important for the server administrator to detect cases in which the number of active logs increases significantly and to understand the cause and whether or not action is required.

When the number of log files is about to increase, the server logs the following message to CTSTATUS.FCS:

The number of active log files increased to <nlogs>

where <nlogs> is the new number of active logs.

The most important of the situations that require keeping the oldest transaction log active are:

Increasing CHECKPOINT_FLUSH to delay flushing of buffers associated with committed transactions:

When creating a new transaction log, FairCom DB determines the recoverability vulnerability due to unflushed buffers associated with committed transactions and keeps the required number of active logs. The following formula can be used to estimate the number of logs required to support unflushed buffer/cache pages based on server configuration settings.

Let:

CPF = CHECKPOINT_FLUSH value (defaults to 2)
CPL =  number of checkpoints per log (typically 3 and no less than 3)
MNL =  minimum number of logs to support unflushed pages

Then:

MNL =  ((CPF + CPL - 1) / CPL) + 2, where integer division is used

For example:

CPF=2,   CPL=3  => MNL = 3 (But the server enforces a minimum of 4)
CPF=7,   CPL=3  => MNL = 5
CPF=9,   CPL=3  => MNL = 5
CPF=10, CPL=3  => MNL = 6

A pending transaction that began several logs ago and still has not committed or aborted:
Unlike CHECKPOINT_FLUSH, which leads to a well-defined limited increase in the number of active transaction logs, a long uncommitted transaction can lead to an unlimited increase in transaction logs. For example, if a client begins a transaction and then sits idle while other clients execute transactions, the other clients’ transaction activity fills the transaction logs. When the server creates new logs it finds that the idle client’s transaction has not committed. The server must keep the log in which the idle client’s transaction begin is logged until that client aborts or commits its transaction. For this reason, it is important to monitor the number of active transaction logs. In the event of an unexpected long transaction, the ctadmn utility can be used to list connected clients and their transaction times and to terminate clients as needed.
Dynamic dumps:
A dynamic dump is similar to the case of a long pending transaction. The dynamic dump must keep all transaction logs from the dump start time to the dump end time in order to include in the dump stream file all transaction activity that occurred during the dump. If very large files are included in the dump, the dynamic dump can take a significant amount of time. Depending on the amount of transaction activity between the dump start and end times, the number of active logs that must be kept during a dynamic dump can be large. FairCom DB logs to CTSTATUS.FCS an explanation as to the condition that triggered the increase. When the increase is caused by a pending transaction, the server attempts to identify the user ID and node name associated with the transaction. Based on the cause of the increased number of active logs shown in CTSTATUS.FCS, the server administrator can take the appropriate action, if any. For example, a long pending transaction can be aborted using the ctadmn utility to terminate the client that began the transaction, or a dynamic dump can be terminated using ctadmn.

Server Is in a Non-Responsive State

The symptoms of a FairCom DB in a non-responsive state are the following:

Requests from connected clients hang indefinitely.
Client connection attempts hang or fail with an error such as FairCom DB error 133.

A non-responsive server can be detected by monitoring connected clients and looking for client requests that have not completed within a reasonable timeframe. Note that some c tree API function calls can be expected to take a significant amount of time (for example, when rebuilding a large file or physically closing a file that has many unwritten updated cache buffers). Even reading a record can take awhile if the call must wait to acquire a lock on the record. The specific symptoms that indicate a fully non-responsive FairCom DB are that all client requests hang and new connection attempts may fail.

When FairCom DB is in a non-responsive state follow these steps to identify the cause and to correct the problem:

The ctadmn utility can be used to monitor the status of client connections, but in the event of a non-responsive server, ctadmn may be unable to connect to the server or its requests may also hang. If ctadmn can connect and can list connected clients, it may be possible to use it to terminate FairCom DB clients or to shut down FairCom DB.
If ctadmn cannot connect to the server, use system monitoring utilities to determine the state of the FairCom DB process. Verify that the process is in a running state and if possible use system utilities to save a core image of FairCom DB process and stack traces for all the server threads. See Monitoring FairCom DB in the FairCom DB Server Administrator's Guide for details about system utilities that can be used to collect information about the state of the FairCom DB process. Any information that can be collected about the state of a non-responsive server can be useful in determining the cause and finding a way to prevent future occurrences of such a problem.
To shut down a non-responsive FairCom DB, follow the steps described in the section Stopping FairCom DB in the FairCom DB Server Administrator's Guide. Shutting down a non-responsive server might require a hard kill.

Some Clients Are In A Non-Responsive State

A FairCom DB client can be in a non-responsive state if a server request hangs or takes a long time to complete. As discussed above, some c tree calls (such as rebuild calls or file close calls that involve physically closing a file) may take awhile to complete. Calls to read a record with a blocking lock will hang until the lock can be acquired.

If requests made by one or more clients do not complete in a reasonable timeframe, follow these steps to identify the cause and to correct the problem:

The ctadmn utility can be used to view the current state of the client connections, including the function name for the current request being processed on behalf of each client and the current request time.
FairCom DB’s snapshot and lock dump capabilities can be used to collect details about the state of FairCom DB. For example, if ctadmn shows one or more clients hanging on record read operations, use the FairCom DB API function LockDump() to dump the current state of the server’s lock table to disk, and examine this log to determine if the read requests are simply blocking waiting to acquire a record lock. If this is the case, identify from the log which client currently holds the lock and use ctadmn to view the activity of that client or to terminate that client if appropriate.
If the cause of the long request time cannot be determined using FairCom DB utilities or FairCom DB API functions, use system utilities to monitor the server’s system calls and to collect details about the server process state. Saving a core image and stack traces for the server threads in this type of situation can help identify the cause of the hanging client requests so that the problem can be resolved.
If necessary, use ctadmn to terminate client connections that are non-responsive. If after terminating the client connections using ctadmn, the client connections still appear in the list of connected clients, FairCom DB may have to be shut down and restarted to clear the hung connections.

Errors Occur When Opening FairCom DB Files

A FairCom DB data or index file can fail to open for a variety of reasons. This section introduces a server configuration keyword that can be used to log system error details in the event of failed file open, create, close, delete and rename operations. The remainder of the section focuses on specific FairCom DB errors returned by FairCom DB file open API functions and ways to resolve the errors.

In This Section

Enabling Low-Level File I/O Diagnostics

FairCom DB Error 12: FNOP_ERR

FairCom DB Error 14: FCRP_ERR

FairCom DB Error 417: SPAG_ERR

FairCom DB Error 456: SACS_ERR

FairCom DB Error 457: SPWD_ERR

Enabling Low-Level File I/O Diagnostics

The FairCom DB configuration keyword DIAGNOSTICS LOWL_FILE_IO is useful in troubleshooting file open, create, close, delete, and rename errors. This keyword causes the server to log to the server status log, CTSTATUS.FCS, the filename and system error code for failed file open, create, close, delete, and rename operations. Although client applications have access to system errors through the c tree global variable sysiocod, it can be useful to have the server log these errors. For example: An end-user has problems opening a file. The end-user copied the data file from a CD-ROM to the hard disk leaving the file marked read-only. When the user attempted to open these files, the open failed with FairCom DB error 12. Adding the DIAGNOSTICS LOWL_FILE_IO keyword directed the Server to log the system error code to CTSTATUS.FCS which helped identify that the file open failed because the file was marked read-only.

FairCom DB Error 12: FNOP_ERR

Error description:
Could not open file: not there or locked.

Possible causes:

The specified filename or path is incorrect (no file by that name exists).
The specified file is already open using a file access mode that conflicts with the specified access mode. For example, if the file is open in EXCLUSIVE mode, attempting to open the file in SHARED mode fails (and vice-versa).
The server does not have the appropriate permission to access the file (for example the file is marked read-only or system file security attributes are set to disallow access by the user and group under which the server is running).

Troubleshooting steps:

Verify that the specified filename and path are correct. Note that filenames on some operating systems are case-sensitive. If using ISAM open functions, it is possible that the data file open succeeded but an index file open failed. Check the isam_fil global variable to determine which file open failed. Try opening the data and index files individually using low-level functions to determine which open fails.
Check the FairCom DB global variable sysiocod. If it is set to -8 (FCNF_COD), this indicates that the open failed due to conflicting access modes.
Check the file permissions to verify that FairCom DB has the appropriate file access permissions (read/write access is usually what is required).
If the failed open occurs on a superfile member, verify that the member exists using the FairCom DB API function GetSuperFileNames() and that the member name is specified exactly as it appears in the superfile directory index (member names are always case-sensitive).

FairCom DB Error 14: FCRP_ERR

Error description:
File corrupt at open.

Possible causes:
FairCom DB sets an update flag in the header of FairCom DB data and index files on the first update to the file after the file is opened. The server resets the update flag when the file is physically closed (after all updated cache pages for the file are written to disk). When the server finds the update flag still set when opening the file, the server considers this to mean that the file was updated but was not properly closed, and so the state of the file is unknown. For example, if a file is updated and then the server terminates abnormally, the update flag remains set, which indicates that unwritten updates might not have been written to the file. Error 14 is most likely to occur for PREIMG and non-transaction files. Because the server’s automatic recovery processes TRNLOG files in the event of an abnormal server termination, error 14 is not expected for TRNLOG files unless automatic recovery fails. See the discussion of TRNLOG, PREIMG, and non-transaction files for full details about caching and the state of files in the event of an abnormal server termination.

Troubleshooting steps:

If the file is a TRNLOG file, review the sequence of events that led up to the error 14. If FairCom DB terminated abnormally, restarting the server should cause automatic recovery to occur, restoring the TRNLOG files to a consistent transaction state and avoiding the possibility of error 14 occurring.
If the file is a PREIMG or non-transaction file and the server terminated abnormally, error 14 can be avoided by rebuilding the affected files, or re-creating the files and re-loading their data from an external source, or by restoring backup copies of the files. To avoid data loss and error 14 for non-TRNLOG files in the event of an abnormal server termination, consider using the WRITETHRU filemode and WRITETHRU server configuration options. See the discussion of the WRITETHRU filemode for details.

FairCom DB Error 417: SPAG_ERR

Error description:
Cache page size error.

Possible causes:
When a superfile is opened, the index node size currently in effect must be the same as the index node size at the time the superfile was created. This is not usually a problem unless one wishes to move the file between different environments. Error 417 results if the node size does not match. By contrast, an ordinary index file can be opened as long as the current node size is at least as large as the node size at the time the index file was created.

Troubleshooting steps:
To resolve this error, either:

Re-create the superfile using the page size setting currently used by FairCom DB, or
Change the server’s PAGE_SIZE setting to ensure it matches the page size used when creating the superfile and restart FairCom DB.

FairCom DB Error 456: SACS_ERR

Error description:
Group access denied.

Possible causes:
The user that is attempting to open a file is not a member of a group with access to the file.

Troubleshooting steps:
Use ctadmn to list the file permissions assigned to the file. To avoid this error, either add the user to a group that has permission to access the file or change the file permissions to allow the user to access the file.

FairCom DB Error 457: SPWD_ERR

Error description:
File password invalid.

Possible causes:
A password is assigned to the file and the file password specified when opening the file is incorrect.

Troubleshooting steps:
Specify the correct file password for the file, or use the ctadmn utility to change or reset the file’s password.

Errors Occur When Reading or Writing FairCom DB Files

In This Section

FairCom DB Error 35: SEEK_ERR

FairCom DB Error 36: READ_ERR

FairCom DB Error 37: WRITE_ERR

FairCom DB Error 39: FULL_ERR

FairCom DB Error 40: KSIZ_ERR

FairCom DB Error 49: FSAV_ERR

FairCom DB Error 35: SEEK_ERR

Error description:
Seek error. A file seek operation on a FairCom DB data or index file failed.

Possible causes:
A file seek operation can fail for various reasons, including disk media errors or an invalid file descriptor.

Troubleshooting steps:
In the event of a SEEK_ERR, FairCom DB logs the following message to CTSTATUS.FCS:

SEEK_ERR...
<filename>

where <filename> is the name of FairCom DB data or index file for which the seek operation failed.

When a FairCom DB API function returns FairCom DB error 35, check the value of the FairCom DB global variable sysiocod, which contains the system error code returned by the failed seek operation. The interpretation of the system error code can explain the cause of the failed seek operation.

FairCom DB Error 36: READ_ERR

Error description:
Read error. A file read operation on a FairCom DB data or index file failed.

Possible causes:
A file read operation can fail for various reasons, including disk media errors and inaccessible files due to locking by external applications.

Troubleshooting steps:
When a FairCom DB API function returns FairCom DB error 36, check the value of the FairCom DB global variable sysiocod, which contains the system error code returned by the failed file read operation. The interpretation of the system error code can explain the cause of the failed read operation.

FairCom DB Error 37: WRITE_ERR

Error description:
Write error. A file write operation on a FairCom DB data or index file failed.

Possible causes:
A file write operation can fail for various reasons, including disk media errors, insufficient disk space, and inaccessible files due to locking by external applications.

Troubleshooting steps:
When a FairCom DB API function returns FairCom DB error 37, check the value of the FairCom DB global variable sysiocod, which contains the system error code returned by the failed file write operation. The interpretation of the system error code can explain the cause of the failed write operation.

FairCom DB Error 39: FULL_ERR

A FULL_ERR (39) error means a file is at it's capacity size limit, and there's no space available to add additional records. For a non-HUGE file, this is not easy to resolve. You will need to convert it to a HUGE file, and rebuild all indexes. The ctcv67 utility helps in this case.

>ctcv67  <file.dat>   <path/to/new/file.da>  H yes

The file will need to be taken offline while conversion takes place. It is a standalone utility, and can not run while the file remains under server control.

FairCom DB Error 40: KSIZ_ERR

Error description:
Index node size too large. FairCom DB was not able to open the specified index file, superfile, or variable-length data file because the page size used when creating the file is larger than the server’s current page size. The page size determines the maximum supported index node size.

Possible causes:
The file was created using a larger PAGE_SIZE setting than the server is currently using. This situation can arise if the file was created by a server or a standalone FairCom DB utility that is configured to use a smaller page size than the server is currently using, or if after the file was created the PAGE_SIZE setting was changed and the server was restarted.

Troubleshooting steps:
To resolve this error, either:

Re-create the file (by rebuilding or compacting the file) using the page size setting currently used by FairCom DB, or
Change the server’s PAGE_SIZE setting to ensure it is at least as large as the page size used when creating the file and restart FairCom DB.

Note: A FairCom DB superfile has stricter page size requirements than a FairCom DB index has. A superfile can only be opened by a server who’s PAGE_SIZE exactly matches the page size used when creating the superfile. See the discussion of FairCom DB error 417 for details.

c-tree Plus ODBC Driver

c-tree Plus ODBC Driver users may experience error 40 when an application’s index file is using a page size larger than allocated by the ODBC driver. To adjust this, access the Windows ODBC Data Source Administrator.

Note: To configure the 32 bit ODBC driver with 64 bit versions of Windows, you must access the ODBC Data Source Administrator from the following directory: %WINDIR%\syswow64\odbcad32.exe

In the ODBC Data Source Administrator window, select the FairCom 32bit driver and click Configure. When the configuration window is displayed, click the Options button.

The page size used by the driver is calculated by multiplying the Sector Size shown in this window by 128 bytes. Try increasing the Sector Size from 16 up to 32, then 64. Be sure to close your ODBC compliant application and reconnect after each change to the ODBC driver configuration.

If this doesn't work, you may need to rebuild the index files. It's possible a corrupted index file (typically caused by a system coming down without the files being closed first) could be cause this error.

FairCom DB Error 49: FSAV_ERR

Error description:
Could not save file. In some situations, FairCom DB must ensure that all writes that have been issued to the filesystem for a FairCom DB data or index file have been flushed to disk. The server accomplishes this by issuing a “save” operation on the file, which involves a system call that forces the filesystem to write to disk all unwritten filesystem buffers for the file. If this flush fails, the server returns FairCom DB error 49.

Possible causes:
A file flush operation can fail for a variety of reasons such as a disk media error, insufficient disk space, or a loss of connectivity to network storage system.

Troubleshooting steps:
When a save operation fails, FairCom DB logs the following message to CTSTATUS.FCS:

ctsave failed: system code = <err> lc = <loc> fd = <fd>
<filename>

where <err> is the system error code returned by the failed flush call, <loc> is a FairCom DB location code, and <filedesc> is the file descriptor passed to the flush call. Check the interpretation of the system error code to determine the reason why the flush call failed.

c-tree API Call Fails With Unexpected Error

If a c-tree API function call fails with an error that is not described in this document and the reason for the error is not clear from the context of the situation, consult the FairCom DB Function Reference Guide entry for the FairCom DB API function that returned the error.

Server Writes Unexpected Messages to Status Log

During server operation, FairCom DB writes messages to the server status log, CTSTATUS.FCS. The messages the server logs may be informational, warning, or error messages. The system administrator should monitor the server status log in order to detect situations in which the server writes unexpected warning or error messages to the status log.

The ctsysm utility can be used to monitor status log messages. This utility reads a configuration file containing the possible status log messages and associated actions depending on the context of the message. As the server logs messages to the status log, the utility examines the messages and outputs the corresponding message code, which can be matched to the appropriate action, if any, for the message. For details on using the ctsysm utility to monitor FairCom DB status log, see FairCom Server Status Monitoring Utility, ctsysm in the FairCom Server Administrator's Guide.

Server Exhibits Atypical Performance

During server operation, the server administrator can use FairCom DB and system monitoring tools to measure performance properties of FairCom DB. The application may also provide tools used to monitor the performance of the system or the database components of the system.

When these system monitoring utilities detect unexpected performance characteristics of the database components of the system, follow these steps to identify the cause of the unexpected performance so the problem can be understood and resolved:

Identify the nature of the unexpected performance characteristics of the system as specifically as possible. For example:
1. Can specific application operations or FairCom DB function calls made by clients be identified that are performing differently than expected? Application-specific metrics, the ctadmn utility, FairCom DB’s snapshot ability, and the function monitor can be used to identify the specific operations.
2. Does the system exhibit unexpected system or FairCom DB resource usage patterns (CPU, disk, network usage, etc.) at the time the unexpected performance patterns occur? Use system and FairCom DB utilities to monitor resource usage and compare to normal operation. Differences in resource usage from normal operation may provide insight into the nature of the unexpected performance behavior.
3. Is the unexpected performance behavior occurring consistently, or does it occur only occasionally? Any pattern that can be identified might help determine the cause of the behavior.
Identify any recent changes to the system that might account for the unexpected performance characteristics. For example:
1. Has the system load changed (for example are more than the usual number of clients using the server or are the clients performing different different operations than usual)?
2. Have there been any hardware or software changes (including FairCom DB configuration option changes)?

Server Exhibits Unexpected Resource Usage

When FairCom DB or system monitoring tools detect unexpected system or server resource usage, follow these steps to identify the cause of the unexpected resource usage so the problem can be understood and resolved:

Identify the nature of the unexpected resource usage as specifically as possible. For example:
1. Is the resource a system resource or a FairCom DB resource? If a system resource, use system tools to identify the process that is directly responsible for the unexpected resource usage. If the responsible process is the c tree Server, use application and FairCom DB monitoring tools to identify whether activity by particular clients accounts for the change in resource usage. System tools can be used to monitor system calls, dump a core image of the server, or stack traces for server threads. The ctadmn utility can be used to terminate clients suspected of contributing to the unexpected resource usage.
2. Does the unexpected resource usage occur consistently, or does it occur only occasionally? Any pattern that can be identified might help determine the cause of the behavior.
Consider whether any recent changes to the system could account for the unexpected resource usage. For example:
1. Has the system load changed (for example are more than the usual number of clients using the server or are the clients performing different different operations than usual)? Application monitoring tools and the c tree Server’s snapshot ability can help identify whether the load on the system has changed recently.
2. Have there been any hardware or software changes (including FairCom DB configuration option changes)?

Dynamic Dump Fails

A dynamic dump backup may fail for various reasons. The following are some possible causes of a failed dynamic dump:

The dump script does not exist or is inaccessible.
The dump script contains invalid options.
One or more files specified in the dump file list cannot be opened.
An error occurs when writing the dump stream file (for example, out of disk space or invalid path specified).

FairCom DB logs dynamic dump error messages and error codes to the server status log, CTSTATUS.FCS. In the event of a failed dynamic dump, examine the server status log.

FairCom DB can be configured to log more detailed dynamic dump progress entries to the server status log, including an entry for each file included in the dump, by adding DIAGNOSTICS DYNDUMP_LOG to the server configuration file before starting the server.

Data or Index File Sizes Grow Unexpectedly

The system administrator should monitor the size of FairCom DB data and index files in order to detect unexpected increases in file size. Monitoring file sizes helps avoid running out of disk space or reaching system file size limits and provides useful information in the event of unexpected server performance behavior or resource usage.

Except for files that are created with deleted space reclamation disabled (using the ctADD2END extended create mode), FairCom DB data and index files reuse deleted space. Fixed-length files that reuse deleted space reuse deleted space with complete efficiency and increase in size only after all deleted space has been reused.

Variable-length FairCom DB data files reuse deleted space by indexing deleted space by size and storing new variable-length records in the deleted space that most closely matches the size of the new record. FairCom DB also coalesces adjacent regions of deleted space in variable-length files into a single block of deleted space. Note that a variable-length file may increase in size even if deleted space is available if the available space is not large enough to store a newly-added record. For this reason, depending on the size of variable-length records and the order of insertion and deletion operations on variable-length records, deleted space in variable-length data files can become fragmented over time and the total file size could be larger than might be expected given the total size of active records in the file.

FairCom DB index files reuse space made available in index nodes by deleted key values and reuse nodes that have become empty and are pruned from the tree. A FairCom DB index file grows only when a new node is added and no empty nodes remain that can be reused.

Reducing the size of a FairCom DB data file by removing deleted space from the files can be accomplished by compacting the data file (which also requires rebuilding the associated indexes). The FairCom DB API function CompactIFileXtd() can be used to compact a FairCom DB data file. Note that a file must be opened in exclusive mode in order to compact it.

To reduce the size of a FairCom DB index file by removing deleted nodes, rebuild the index file using the FairCom DB API function RebuildIFileXtd(). Note that this function requires exclusive access to the data file and its associated index files. If index file size is a concern, key compress may help reduce the overall index size by storing key values in compressed format. See the FairCom DB Programmer’s Reference Guide for details on creating an index that contains compressed keys.

Server Terminates Abnormally

An abnormal server termination is a termination of s FairCom DB process that does not involve a clean server shutdown. FairCom DB may terminate abnormally for the following reasons:

A FairCom DB process encounters a fatal exception, causing the system to terminate the server process. In this case the system may produce a core image of the server process at the time of the exception. The FairCom DB status log may contain error messages related to the exception.
An administrator forcibly terminates the FairCom DB process. In this case, the process is abruptly terminated and theFairCom DB status log does not show an indication of a shutdown occurring.
The system on which FairCom DB is running terminates abnormally (due to power loss, operating system exception, or sudden system reboot). In this case, the server process may be abruptly terminated, which case the status log does not show an indication of a shutdown occurring.
FairCom DB detects an unexpected internal server error situation known as a catend or a terr. In these cases, the server status log shows an error message containing details about the internal server error.

In This Section

Recovering From Abnormal Server Termination

It is important to understand the reason for the abnormal server termination so that the appropriate information about the event can be saved and any necessary actions can be taken before restarting FairCom DB. Follow these steps after an abnormal c tree Server termination occurs:

Examine system logs, application logs, and the server status log to determine the nature of the abnormal server termination.
1. If a fatal exception terminated the server process, save the core file if it exists.
2. If the server terminated due to a fatal exception or internal FairCom DB error, save a copy of the server’s *.FCS files, the server configuration file, and if time and disk space permit save a copy of all data and index files before restarting FairCom DB. These files can be used to analyze the abnormal server termination.
3. If the situation that led to the abnormal server termination can be understood by analyzing the server status log or other system logs, correct the problem that caused the server to terminate. For example, if the server terminated due to insufficient disk space which prevented the server from writing to its transaction logs, free up disk space to ensure the server has enough space for the transaction logs (but do not delete active transaction logs before the server performs its automatic recovery).
Determine the status of PREIMG and non-transaction data and index files and restore or recover these files as needed. PREIMG and non-transaction files are not under full transaction control, so in the event of an abnormal server termination, these files may be in an unknown state. Updates that had been written to the server’s cache but not to disk are lost, data files and index files may be out of sync, and PREIMG files may be in an inconsistent transaction state.
To determine if a PREIMG or non-transaction file needs to be restored or recovered, open the file using a standalone (non-client/server) FairCom DB utility. If the file opens successfully, the file is in good shape. If the file open fails with FairCom DB error 14, the file was updated but was not properly closed and its state is unknown, so the file must be restored or recovered. The options for restoring or recovering PREIMG and non-transaction files following an abnormal server termination are the following:
1. Re-create PREIMG and non-transaction files and reload their data from an external source if available, or
2. Rebuild the files to ensure the data and index files are in sync (although unwritten updates are still lost), or
3. Restore old copies of the files from backup.

Note: See the discussion of the WRITETHRU filemode and FairCom DB error 14 for details on the use of WRITETHRU and server configuration keywords to avoid error 14 for PREIMG and non-transaction files in the event of an abnormal server termination. Be aware that although these options provide ways to avoid error 14 in such a situation, it is still possible for WRITETHRU files to contain data/index inconsistencies or for PREIMG files to be in an inconsistent transaction state following an abnormal server termination.

Recover TRNLOG files using the server’s automatic recovery process. After following the above steps, TRNLOG files can be recovered by restarting FairCom DB. The server detects an abnormal server termination and performs automatic recovery of TRNLOG files, restoring the TRNLOG files to a consistent transaction state. Upon successful completion of automatic recovery, the server is fully operational. At this point clients can connect to the server and can resume their work.

For details on what steps to follow in the event of recovery or restore failures (such as automatic recovery failing), see the section titled “Failures During System Recovery”.

Failures During FairCom DB Shutdown

This section discusses failures that may occur during FairCom DB shutdown.

In This Section

Server Shuts Down Improperly

Server Shutdown Hangs or Takes Excessive Time

Server Shuts Down Improperly

This section discusses ways in which a FairCom DB shutdown may fail to complete properly. A normal FairCom DB shutdown is indicated by the following messages in the server status log:

Fri Sep 26 14:30:08 2003
 - User# 12     Server shutdown initiated
Fri Sep 26 14:30:09 2003
 - User# 12     Communications terminated
Fri Sep 26 14:30:09 2003
 - User# 12     Perform system checkpoint
Fri Sep 26 14:30:09 2003
 - User# 12     Server shutdown completed
Fri Sep 26 14:30:09 2003
 - User# 12     Maximum memory used was 116088930 bytes

A normal FairCom DB shutdown operation may not complete properly for the following reasons:

The server may terminate abnormally at shutdown due to a fatal exception.
The server may complete its shutdown without successfully terminating all active client threads.

If the server shutdown terminates abnormally due to a fatal exception, the server does not write the “Server shutdown completed” message to the server status log. Follow the recovery procedures described in the section “Recovering From Abnormal Server Termination”.

If the server shutdown completes without successfully terminating all active client threads, the server avoids writing a final checkpoint to the transaction logs. This causes the server to perform automatic recovery on its next startup. The server notes this situation at shutdown by logging the following message to CTSTATUS.FCS:

Mon Sep 29 16:15:51 2003
- User# 13     Clients active: skipped system checkpoint.
Auto-recovery on next start up

Treat this situation similar to an abnormal server termination, because it is possible that clients were modifying PREIMG and non-transaction files at the end of server shutdown which could mean that some updates did not get written to disk. Before restarting FairCom DB, open PREIMG and non-transaction files using a standalone utility to determine if they need to be rebuilt. The server’s automatic recovery ensures that TRNLOG files are in a consistent transaction state on the next server startup.

Server Shutdown Hangs or Takes Excessive Time

The FairCom DB shutdown process may take a long time for the following reasons:

The server must write all updated data and index cache pages to disk before shutdown completes. If the server is configured to use a large cache, there can be many updated cache pages that must be written to disk at shutdown, which increases the shutdown time.
The server processes entries in the delete node queue at shutdown. The presence of many entries in the delete node queue can lead to increased server shutdown time.
The server allows clients time to recognize that the server is shutting down and to disconnect from the server. The shutdown time can be long if many clients are connected or if the server is not able to terminate connected clients.

In This Section

Monitoring FairCom DB Shutdown Progress

Forcibly Terminating FairCom DB During Shutdown

Monitoring FairCom DB Shutdown Progress

The progress of FairCom DB shutdown can be monitored in the following ways:

Monitoring messages FairCom DB writes to the server status log during shutdown.
Monitoring messages FairCom DB writes to its console or standard output during shut
Monitoring system resource usage by FairCom DB during shutdown.

FairCom DB shutdown is normally accompanied by messages in the server status log such as the following:

 - User# 13     Process delete node Q....
 - User# 13     Clients still active.....
 - User# 13     Clients shutting down....

These messages indicate the current operation FairCom DB is performing, such as processing delete node queue entries, terminating connected clients, and allowing clients time to shut down.

The server also writes shutdown messages to its console window or standard output. These messages provide more detailed information than the status log entries, including the remaining number of delete node queue entries, and the current number of active client threads:

Process delete node Q.......<num_queue> entries.
Clients still active........<num_active>
Clients shutting down.......<num_active>

where <num_queue> is the current number of entries in the delete node queue and <num_active> is the current number of client threads that are still active.

System utilities can also provide insight into FairCom DB’s progress during shutdown. Monitor disk activity on the FairCom DB data and index files to determine if the server is taking a long time to shut down because it is flushing data and index cache buffers. Monitor the number of active server threads to determine how many client threads are still active. See the “Monitoring FairCom DB Process State” section of this document for additional ways to monitor the state of theFairCom DB process.

Forcibly Terminating FairCom DB During Shutdown

If FairCom DB is taking a long time to shut down or if the server appears to hang during shutdown, the server process can be terminated using system utilities but be aware of the effect on FairCom DB data and index files. Forcibly terminating a FairCom DB process at shutdown effectively causes an abnormal server termination, which means that unwritten updates for PREIMG and non-transaction files may be lost and that the server will perform automatic recovery on its next startup in order to ensure TRNLOG files are in a consistent transaction state. For details on the state of FairCom DB data and index files in the event of an abnormal server termination, see the topic “Server Terminates Abnormally” in the “Failures During FairCom DB Operation” section of this document.

Failures During System Recovery

This section discusses failures that may occur during system recovery.

In This Section

Automatic Recovery Fails

Dynamic Dump Restore Fails

File Rebuild Fails

File Compact Fails

Automatic Recovery Fails

At startup, FairCom DB examines the transaction logs to determine whether or not it needs to perform automatic recovery. If so, the server initiates recovery and when the recovery successfully completes, the server startup continues as usual. In some cases, however, the automatic recovery process can fail. For example automatic recovery may fail if:

The server’s transaction logs are damaged, missing, or inaccessible.
TRNLOG FairCom DB data or index files that automatic recovery determines it must process are damaged, missing, or inaccessible.
The server configuration file settings are inconsistent with the settings used the last time FairCom DB was run.

In This Section

Recovering from Automatic Recovery Failure

FairCom DB File Open Errors During Recovery

Automatic Recovery Terminates Abnormally

Automatic Recovery Takes Excessive Time

Recovering from Automatic Recovery Failure

If automatic recovery fails, FairCom DB logs error messages to its status logs and terminates. In the event of an automatic recovery failure, proceed as follows:

Examine the server status log to determine the type of automatic recovery failure.
See the specific failure cases in the following sections for details on each type of recovery failure and if possible correct the problem. Restart FairCom DB and allow automatic recovery to complete successfully.
If the automatic recovery failure cannot be corrected, follow these steps to recover or restore TRNLOG files and to resume FairCom DB operation:
1. If automatic recovery terminated due to a fatal exception and the system generated a core file, save a copy of the core file for offline analysis.
2. Save a copy of the server’s transaction logs (*.FCS files), the server configuration file, and if time and disk space permit save a copy of all TRNLOG data and index files. These files can be examined offline to attempt to identify the cause of the automatic recovery failure.
3. TRNLOG files that were in use at the time of the abnormal server termination may be in an unknown state. To determine the state of each TRNLOG file, attempt to open each TRNLOG file using a c tree file open function. If the file opens successfully, the file is in good shape and did not need to be processed by automatic recovery. If the open fails with error 14, the file must be rebuilt, re-created, or restored from backup.
4. Verify that the server-maintained TRNLOG files FAIRCOM.FCS, SYSLOGDT.FCS, and SYSLOGIX.FCS can be properly opened. If the files fail to open (for example, with FairCom DB error 14 due to the failed automatic recovery) the server will fail to start up.
5. Move existing transaction logs from the server directory to a temporary alternate directory location(*).. Transaction logs consist of the files S0000000.FCS, S0000001.FCS, and all files named L<lognum>.FCS, where <lognum> is a 7-digit number.
  (*) Transaction logs may contain important unrecovered data. We want to retain these existing logs in case further data recovery is required in extreme cases.
6. Restart FairCom DB. The server creates a new set of transaction logs and is ready for operation again.

The following sections discuss specific automatic recovery failure situations and the options that are available in each case.

FairCom DB File Open Errors During Recovery

Automatic recovery fails if a TRNLOG file that must be processed during recovery cannot be opened. See the section titled “Errors Occur When Opening FairCom DB Files” for possible errors that may occur when opening FairCom DB files and what can be done in each case.

A special case that can occur during automatic recovery is that a file cannot be opened because an application used a FairCom DB API function to delete or rename the file. In some cases, automatic recovery does not realize that the file should be expected to be missing for this reason, and automatic recovery attempts to open the file and fails with FairCom DB error 12 because a file by the specified name does not exist.

Note: Creating TRNLOG files as transaction-dependent files (in which file creation and deletion are guaranteed to be atomic and these events are indicated by transaction log entries) avoids most occurrences of this type of situation, but does not guarantee that this situation cannot occur.

When this situation occurs, the server logs the following messages to the server status log:

Tue Sep 30 10:44:10 2003
 - User# 01     mark.idx: 12
Tue Sep 30 10:44:10 2003
 - User# 01
     *** Recovery may proceed by adding 'SKIP_MISSING_FILES YES' ***
     *** to the server configuration file.    ***
Tue Sep 30 10:44:10 2003
 - User# 01    Automatic recovery terminated with error: 12

As indicated in the server status log messages, the SKIP_MISSING_FILES server configuration keyword can be added to the server configuration file in order to avoid this error when a file is missing during automatic recovery. Because error 12 can occur for other reasons (for example, a file may be inaccessible to the FairCom DB process due to file permissions or due to an unavailable volume), confirm that the specified file does not exist and that there is a reasonable explanation as to why this file does not exist before adding the SKIP_MISSING_FILES option to the server configuration file and restarting FairCom DB.

Automatic Recovery Terminates Abnormally

If the FairCom DB process encounters a fatal exception during automatic recovery, causing the system to terminate the server process, the system may produce a core image of the server process at the time of the exception. The FairCom DB status log may contain error messages related to the exception.

In this situation, examine the server status log to see if there are any error messages that point to the cause of the exception. If the status log shows automatic recovery errors, consult the appropriate section above for actions based on the specific error code shown in the status log. FairCom DB may be restarted in order to retry automatic recovery, but if the recovery continues to fail in this manner and the problem cannot be corrected, follow the steps listed in the "Recovering From Automatic Recovery Failure" section.

Automatic Recovery Takes Excessive Time

Automatic recovery may take a long time to complete for the following reasons:

Server configuration settings such as increasing the log size and checkpoint interval may require the server to scan a significant amount of log entries and to process a considerable number of transaction undo and redo operations.
TRNLOG indexes that do not use the LOGIDX filemode may need to have their tree structure reconstructed, which can increase recovery time.

In the event of a long automatic recovery, server administrator has the following options:

Allow automatic recovery to complete (waiting as long as it takes).
Terminating the server process and restarting recovery (if server configuration settings or other system properties can be changed that may improve recovery speed).
Terminating the server process and abandoning recovery (re-creating or restoring TRNLOG files from backup).

See the "Server Startup Hangs or Takes Excessive Time" section for details on monitoring automatic recovery progress and the “Recovering from Automatic Recovery Failure” topic above for steps to follow when choosing to abandon automatic recovery and re-create or restore TRNLOG files.

Dynamic Dump Restore Fails

Restoring files from a dynamic dump stream file may fail for various reasons. Some examples include:

The dump restore script is missing.
The dump restore script contains invalid options or options that are inconsistent with FairCom DB settings.
An error occurs when the dump restore attempts to restore files to the dump time.
The dump restore is performed in a directory with files that interfere with the restore procedure (such as existing transaction logs).

When a dump restore operation fails, the ctrdmp utility logs error messages to the file CTSTATUS.FCS. Check this file for a FairCom DB error code that explains the cause of the failure and take the appropriate action. Review the dump restore procedure to ensure that the proper steps were followed.

Data and index files are dumped to the dump stream file by reading the contents of the file from disk. Because a file is not instantaneously read in its entirety, a data or index file in the dump stream file may consist of the file contents as they appear over a period of time. Dump recovery includes the transaction log activity during the dump time, so that the file can be restored to its state at the time the dump began. For this reason, if the dump restore successfully extracts files from the dynamic dump stream file but fails when attempting to restore the files to the dump time, the restored data and index files are in an unknown state.

If ctrdmp fails when attempting to restore the files to the dump time and no solution can be found, the affected data and index files can be rebuilt to ensure the data and index files are in sync, but the rebuild may fail because the data file may contain a mixture of record images from different points in time, or the files can be restored from a different backup.

Note: Consider performing dump restore operations offline immediately after the dynamic dump backup is performed so that dump restore failures can be resolved at backup time rather than at restore time.

File Rebuild Fails

A file rebuild operation may fail for various reasons. Examples include:

The IFIL resource in the data file is missing or damaged.
The data file cannot be opened (for example, if the header of the file is corrupted).
The rebuild detects illegal duplicate key values.

When a rebuild fails, check the return code of the FairCom DB API function used to rebuild the file and consult the FairCom DB Function Reference Guide entry for that function to determine the appropriate action.

The ctrbldif utility can be used to rebuild a file that contains a valid IFIL resource. The utility opens the data file and retrieves the IFIL resource from the file. If the utility cannot read the IFIL resource, it prompts for the name of a file containing a good copy of the IFIL resource. Consider saving a copy of the file containing the IFIL resource for use in such a situation.

The ctrbldif utility and FairCom DB rebuild API functions support an option to handle the presence of illegal duplicate keys by marking records containing duplicate key values as deleted.

If a file rebuild fails and no solution is found which allows the rebuild to complete successfully, re-create the file and reload the data from an external source if available, or restore a backup copy of the file.

File Compact Fails

Like a file rebuild operation, a file compact operation may fail for various reasons. When a file compact operation fails, check the return code of the FairCom DB API function used to compact the file and consult the FairCom DB Function Reference Guide entry for that function to determine the appropriate action.

The ctcmpcif utility can be used to compact a file that contains a valid IFIL resource. The utility opens the data file and retrieves the IFIL resource from the file. If the utility cannot read the IFIL resource, it prompts for the name of a file containing a good copy of the IFIL resource. Consider saving a copy of the file containing the IFIL resource for use in such a situation.

The ctcmpcif utility and FairCom DB compact API functions support an option to handle the presence of illegal duplicate keys by marking records containing duplicate key values as deleted.

If a file compact fails and no solution is found which allows the compact to complete successfully, re-create the file and reload the data from an external source if available, or restore a backup copy of the file.

LockDump Output

LockDump() is a low-level FairCom DB function that creates a diagnostic dump of the FairCom DB internal lock table. This is useful in development and profiling activities to observe application locking behavior. The syntax of the function is as follows:

COUNT LockDump(COUNT refno, pTEXT dumpname, COUNT mode)

The possible legal combinations of the LockDump() parameters mode and refno are as follows:

mode	refno	Interpretation
ctLOKDMPfile	ctLOKDMPallfiles ctLOKDMPdatafiles ctLOKDMPindexfiles filno	Dump all locks by file. Dump all locks on data files. Dump all locks on index files. Dump locks for file filno.
ctLOKDMPuser	ctLOKDMPallusers ctLOKDMPcaller userno	Dump all locks by user. Dump locks for user calling LockDump(). Dump locks for user userno.

mode

refno

Interpretation

ctLOKDMPfile

ctLOKDMPallfiles

ctLOKDMPdatafiles

ctLOKDMPindexfiles

filno

Dump all locks by file.

Dump all locks on data files.

Dump all locks on index files.

Dump locks for file filno.

ctLOKDMPuser

ctLOKDMPallusers

ctLOKDMPcaller

userno

Dump all locks by user.

Dump locks for user calling LockDump().

Dump locks for user userno.

The resulting lock dump output will be found in the file given by dumpname.

In all but one case of the above combinations the caller of LockDump() does not have to have any files open, although it is no problem if the caller does have files open. In the case of ctLOKDMPfile/filno, the caller must have opened a file with file number filno. The userno referenced in the last combination is the thread ID assigned by FairCom DB. This thread ID is listed when ctadmn is used to list users logged on to FairCom DB. In addition to dumping the location of the lock and the type of lock, users waiting for a lock are also listed.

Limitations

Since dumping all the locks in a very active system with many locks could affect performance, FairCom DB will ONLY support the LockDump() call if either of the following conditions is met:

The configuration file, ctsrvr.cfg, contains DIAGNOSTICS LOCK_DUMP.
The user calling LockDump() belongs to the ADMIN user group.

Lock Dump Contents

=================================================
All Files Lock Dump at Fri May 04 13:00:12 2007
----------------
----------------
SOMEFILE.FCS>>
        0000-013c9a16x T221 write/1: W060 W254 W740 W763 W758
        0000-002916abx T758 write/1: W774 W772 W771 W775 W773 W778 W779 W776 W071
cumulative lock attempts: 4002(616)  blocked: 21(0)  dead-lock: 0  denied: 0
Current file lock count: 0
----------------
cumulative I/O: read ops: 0 bytes: 0    write ops: 5 bytes: 16768
.
.
.
.
List of connected clients
-------------------------
User# 00002: (Node name not set)
User# 00012: (Node name not set)
=================================================

Description

In the example above, The following details can be obtained:

There are two records with locks held in SOMEFILE.FCS, each listed with it’s locked file offset value: 0000-013c9a16x and 0000-002916abx.
The thread ID of the users holding the locks (T221 and T758)
The type of lock: write/1
A listing of thread IDs waiting for the record lock to be released (for example, W060 W254 W740 W763 W758)
The waiting thread IDs are further delineated with a prefix indicating the type of lock they are waiting to obtain: (R)ead or (W)rite locks.

Types of Locks

The possible lock types are shown in the following table.

Lock Type	Value	Explanation
SS open	1	SS (strict serializer) logical Open lock
SS commit intent	2	SS commit intent lock
SS commit	3	SS commit lock
NS commit intent	4	NS (nonstrict serializer) commit intent lock
NS commit	5	NS commit lock
read	6	Read lock - A read lock requested and held by a user thread.
write/1	9	Exclusive write lock - A write lock requested and held by a user thread.
write/2	10	Exclusive write lock (no aggregate check) - An internal lock very briefly held by FairCom DB for files under transaction control. You may occasionally observe these in a system with a high transaction volume, and these can be safely ignored.
forcei cmtlok	11	A very briefly held commit read lock enforced by FairCom DB. These will only occur when the `COMMIT_READ_LOCK` option is enabled in the server configuration file. These may be occasionally observed in systems with high transaction volumes.

Note: The first five lock types listed in the table are only supported with a FairCom DB Server built with strict serialization support.

File Lock Info

cumulative lock attempts xxx (yyy) - Total number of file and (header) lock attempts. The header locks are internal FairCom DB locks required for critical updates to the file header.
blocked - Total number of locks and (header locks) that were blocked while another lock was held. In a high volume system, some blocked lock attempts are expected.
dead-lock - Total of dead-lock conditions reported for this file. These are generally not expected, and error DEAD_ERR (86) is returned to the application caller when this condition is detected. DEAD_ERR is returned when waiting for a write lock would cause a deadlock condition.
denied - Total number of locks denied to a caller with error DLOK_ERR (42). A lock is denied if the record is already locked. Note that blocking locks cause the thread to sleep until the lock is available, avoiding the DLOK_ERR.

Cumulative I/O

read ops - Total cumulative read operations for this file.
bytes - Total cumulative bytes read for this file.
write ops - Total cumulative write operations for this file.
bytes - Total cumulative bytes written for this file.

List of connected clients

A list of all connected clients is appended to the end of the lock dump output. This assists the correlation of known user threads at the application level to threads with potential blocked locks.

User# - The thread ID of the user as identified by FairCom DB
Node Name - The node name of the thread as assigned by the application.

Note: On Windows, the list of connected clients includes the IP address in addition to the user name and node name.

Locating a Record in LockDump Output

The record offset is split into ctreeRecordOffsetSelector-ctreeRecordOffsetLong in the LockDump output:

ctreeRecordOffsetLong is the low word of the 64-bit record offset
ctreeRecordOffsetSelector is the high word

Combine them to get a complete record offset. In C, this can be done with the following:

long offset = (((long)ctreeRecordOffsetSelector) <<32) +(uint)ctreeRecordOffsetLong;

In This Section

Types of Locks

Locating a Record in LockDump Output

Types of Locks

Types of Locks

The possible lock types are shown in the following table.

Lock Type	Value	Explanation
SS open	1	SS (strict serializer) logical Open lock
SS commit intent	2	SS commit intent lock
SS commit	3	SS commit lock
NS commit intent	4	NS (nonstrict serializer) commit intent lock
NS commit	5	NS commit lock
read	6	Read lock - A read lock requested and held by a user thread.
write/1	9	Exclusive write lock - A write lock requested and held by a user thread.
write/2	10	Exclusive write lock (no aggregate check) - An internal lock very briefly held by FairCom DB for files under transaction control. You may occasionally observe these in a system with a high transaction volume, and these can be safely ignored.
forcei cmtlok	11	A very briefly held commit read lock enforced by FairCom DB. These will only occur when the `COMMIT_READ_LOCK` option is enabled in the server configuration file. These may be occasionally observed in systems with high transaction volumes.

Note: The first five lock types listed in the table are only supported with a FairCom DB Server built with strict serialization support.

Locating a Record in LockDump Output

The record offset is split into ctreeRecordOffsetSelector-ctreeRecordOffsetLong in the LockDump output:

ctreeRecordOffsetLong is the low word of the 64-bit record offset
ctreeRecordOffsetSelector is the high word

Combine them to get a complete record offset. In C, this can be done with the following:

long offset = (((long)ctreeRecordOffsetSelector) <<32) +(uint)ctreeRecordOffsetLong;

How do I clean and reset the transaction numbers for my files?

FairCom provides the FairCom DB Clean Transaction Water-Mark utility, ctclntrn, to reset the high-water mark transaction numbers in your index files.

In the event of an impending transaction number overflow, the server administrator should follow these steps to restart transaction numbering:

Perform a clean FairCom DB shutdown.
Delete the transaction logs and housekeeping files: files S0000000.FCS, S0000001.FCS, D*.FCS, I*.FCS, and L*.FCS.
Use the ctclntrn utility to clean all indexes (which resets the transaction numbers in the leaf nodes and index header) used by your application, including your application index files, superfiles, variable-length data, and FairCom DB files including the following if present:
- FAIRCOM.FCS -- If you are not using any User IDs or passwords other than ADMIN and GUEST, you can simply delete this file. (If you do delete this file, tFairCom DB will re-create it with the single user ADMIN and password ADMIN.)
- CTSYSCAT.FCS -- This is only present if you are using c-tree ODBC Drivers.
- SEQUENCEIX.FCS
- SYSLOGIX.FCS
- SYSLOG*.FCS
- ALL FairCom DB SQL Database Dictionaries
  - <databasename>.dbs/SQL_SYS/<databasename>.fdd
  - ctdbdict.fsd (session dictionary)
Restart FairCom DB.

The server will create new transaction logs and start transaction numbering from 1 again.

It is important to run ctclntrn on all files. If you miss any files and later open that file with a large transaction number in its header, the server will again increase its transaction number to that large value. You will need to repeat this procedure should that occur.

FYI: The ctclntrn utility uses the CleanIndexXtd() FairCom DB function. This function cleans any leaf nodes of exceptional transaction marks and resets the transaction high-water mark in the header to zero. This avoids rebuilding the entire index file.

You can use the FairCom DB High-water Mark Utility, cthghtrn, to verify that the transaction high-water marks in the files are back to zero, or other reasonably low number.

Tip: The AUTO_CLNIDXX YES configuration option can help automate and detect files which have been missed.

See Also

Pending File ID Overflow: Error 534 in CTSTATUS.FCS

If FairCom DB automatically shuts down and the following message is found in CTSTATUS.FCS, the transaction file numbers have been exhausted:


  - User# 00018		 Pending File ID overflow: 534
  - User# 00018		 O18 M18 L58 F-1 Pfffff003x (recur #1) (uerr_cod=534)

This can be an issue on older FairCom DB versions (before revision 26980) because they used file ID numbers each time a FairCom DB data or index file was physically opened, even if it was just read, not written. Now FairCom DB uses a file ID number only when a file is physically opened and then updated.

Follow these steps to get back into operation:

Shut down FairCom Server cleanly: Verify that CTSTATUS.FCS shows that all connections were successfully closed and a final checkpoint was written, as indicated by the message "Perform system checkpoint" in CTSTATUS.FCS.
Remove your transaction log files (L*.FCS and S*.FCS files).
Restart FairCom DB.

Note: The "Pending File ID Overflow" message is not to be confused with the message "Pending TRANSACTION # overflow" which has the same error code (534) but has a different cause.

See the topics below for more information:

In This Section

Understanding the "Pending File ID Overflow" Message

Determining the Current File ID

Recommended Actions

Understanding the "Pending File ID Overflow" Message

The “Pending File ID Overflow” message indicates the FairCom Server internal file ID numbers are getting close to the upper limit.

Each time a transaction controlled c-tree data file or index file is opened, the value of its file ID number is increased. If your system has a large number of files, this value can increase a fair amount with each day of processing.

The upper limit for this value is: 4,294,963,200

If the upper limit is hit, the Server process will shut down.

The value at which a “Pending File ID Overflow” warning message first appears is: 4,227,858,432
The message “Pending File ID Overflow” will be written to CTSTATUS.FCS. A new entry will be logged every time another 10,000 numbers are used. From the time the first warning message appears, you have at most 67,104,768 additional data file and index file opens before this value hits this limit.

When the transaction file numbers have been exhausted, error 534 and the following message will be logged in CTSTATUS.FCS:

  - User# 00018		 Pending File ID overflow: 534
  - User# 00018		 O18 M18 L58 F-1 Pfffff003x (recur #1) (uerr_cod=534)

If you get error 534, you must do a transaction log reset.

Determining the Current File ID

To determine the current value of your system’s file ID number, you can use the ctstat transaction snapshot (ctstat-vat). The file ID number is shown as the tfil value (the sample below shows tfil of 233):

ctstat -vat -h 1 -i 1 1 -t -s FAIRCOMS
lowlog    curlog      lstent      lstpnt     lstsuc       tranno      tfil
    46        49     3217645     3217445          0      1045589       233

Recommended Actions

The following actions are suggested when the file ID warning message is seen.

First, determine how much time you have before the upper limit is hit and the server shuts down:

Use the ctstat -vat command (as shown in the previous section) on one of your highest number transaction logs to get the current file ID setting. Notice that this setting is only captured at the initial log creation, so it will increase during the processing of the active log.
Execute the ctstat -vat command on transaction logs from the previous day, first with the earliest log for the day and then with the last log for the day.
Calculate the difference in the file IDs. This will give you an idea of how many file IDs you have consumed during a given day so you can determine if you can safely wait until the next scheduled system restart.

In This Section

Transaction Log Reset

Once you can safely shut down the system, be sure to shut it down cleanly. The best practice recommendation for shutting down FairCom Server is as follows:

Cleanly shut down the FairCom Server.
Restart the FairCom Server and prevent any users from connecting.
Cleanly shut down the FairCom Server a second time. This second shutdown ensures that any pending transactions in the current logs are processed.
Now you may safely move the existing transaction logs to a new location. Move the following files: *.FCS and *.FCT
Copy the FAIRCOM.FCS file back to its original location.
The only *.FCS to keep in your current directory is FAIRCOM.FCS.

FAIRCOM.FCS stores user information such as user IDs, so if you don’t keep this file, you will have to recreate your users.

Restart the FairCom Server and it will create new transaction logs from scratch. You may confirm this by looking at the file names of the transaction logs (on Unix/Linux: lsL*.FCS) and you should see the first L*.FCS has been reset to number L*00001.FCS
The Old logs saved from step 4 may be discarded

If you would like to confirm that the file ID value has been reset, you can execute the ctstat -vat command again. You should see the file ID value is now a very low number.

Timeout Error Diagnosis

The question we are trying to answer is what is causing calls to FairCom DB made by FairCom DB client processes to fail with error 808/809? To answer these questions it is helpful to have answers to the following questions:

Is there a pattern to the times at which the errors occur?
Is there a pattern as to which processes encounter the error?
Are there any factors common to the customer sites that encounter the error (especially factors that are not present on systems that do not encounter the error)?
What FairCom DB function call returned error 808/809? If it's a record read call, that points to record locking as a likely cause.

Recent Observations

Some of the intervals between recent monitoring log entries are larger than expected (say 16 seconds although lock dumps are being taken every 5 seconds). This raises the questions:

What could cause the intervals to be unexpectedly large? All the timestamps tell us is that the time between the taking of the two timestamps is 16 seconds. We don't know where the delay occurs.
How can we understand the cause of the delay? If it is FairCom DB delaying its response to the client, taking a process stack trace of FairCom DB will show us where in the code the threads are blocked. Knowing this, we can come up with ideas as to the cause.

Options to Consider

Understanding the cause of the error using the binaries that are already deployed at customer sites:

Review what we have learned so far.
Above all, the most likely cause of errors 808/809 is application lock behavior. This is at least partly confirmed by specific and documented cases^1,2. We should at a minimum attempt to rule out lock problems first in each case. This can be accomplished with SNAPSHOT.FCS data at the very least and lock dump data if possible.

Understanding the cause of the error that might require new FairCom DB or client application binaries:

Consider using the blocking lock timeout feature. With the blocking lock timeout, a call to FairCom DB that times out on a blocking lock request will fail with error 827, which will make it very easy to distinguish between delays that involve locking and those that do not. A pre-production test system is a good candidate for this type of test.
The client binary can be modified to produce a server stack trace at the appropriate time. Consider placing a pstack() call (as well as a Snapshot() and LockDump() calls) in the FairCom DB client code immediately before the 808/809 error is returned to obtain a stack trace of the server in this instant in time. This would prove a strategic time and location to grab FairCom DB state.
Other ideas?

Proposed actions

Collect a complete repository of data for occurrences of the error. It is important to collect as much information about all occurrences of the error 808/809 as possible, so we can look for patterns. For each occurrence include:

Customer name/site at which the error occurred
Version of FairCom/Customer/system software running on the system
What is the socket timeout value that is in use on the system?
Relevant software configuration details: ctsrvr.cfg, anything else?
Relevant information about hardware in use on the system: # of CPUs, disk type, location of data/index files, transaction logs
Complete I/O subsystem details. SAN drive manufacturer, configurations, partitioning, RAID levels, backup methods
Time and date at which the error occurred
What processes encountered the error
What happened next: were processes restarted, and did the condition happen again or was operation normal after that?
What monitoring data do we have from both normal operation and the activity at the time the error occurred? Examples include SNAPSHOT.FCS, lock dump log, application error logs, CTSTATUS.FCS, FairCom DB process stack traces, any other system logs such as disk data (sar) or CPU data?
What analysis have we done on the available data? Did we at least examine SNAPSHOT.FCS, lock dump log, CTSTATUS.FCS, FairCom DB process stack trace?
Discuss specific cases, which showed up as errors 808/809 and have been resolved.

Heap Debugging on Solaris 9+ Operating Systems

You can enable heap debugging for the FairCom DB process by setting these environment variables before startup.

export LD_PRELOAD=libumem.so.1
export UMEM_DEBUG=default
export UMEM_LOGGING=transaction

This is a low overhead malloc() debugging method, suitable for a production environment experiencing possible heap corruption. With the above options, it will check some guard zones for memory overwrites on alloc/free.

It also has many commands when a core generated with libumem is used with the mdb debugger.

::umem_status and ::umem_verify are useful, and the complete list of dcmds can be found in mdb by executing:

> ::dmods -l libumem.so.1

man umem_debug has more details.

watchmalloc is another malloc debugging library on Solaris that detects memory overwrites more reliably, however, runs very slowly and is not useful in a production environment.

prstat and Performance Monitoring on Solaris Operating Systems

The prstat utility can give a view of the FairCom DB process activity including user and system time and time waiting for locks.

The following shell script can be used to run prstat on the ctsrvr process at 5-second intervals. Specify the ctsrvr process id (PID) as the command-line option:

#!/bin/csh
prstat -Lmc -p $1 5 | nawk '$1=="PID" { "date" | getline d ; close("date"); print d} { print $0 }' > ctsrvr_prstat.log

The pstack utility is very useful for peering into the FairCom DB process. It writes call stacks for all of the FairCom DB threads. If you can run pstack on the ctsrvr process at times you observe long response times, it is possible to see exactly what code the threads are executing, and if they are waiting on any particular resources.

The example below is a script to call pstack. The ctsrvr process ID is the command-line argument:

#!/bin/csh

date >> ctsrvr_pstack.log
pstack $1 >> ctsrvr_pstack.log

Using Windows Process Explorer to Obtain Thread Call Stacks

Process Explorer is a Microsoft tool useful in viewing the internal properties of a Windows executable process. It can be a very valuable tool in observing FairCom DB behavior when things are not functioning as expected. Follow these steps to use this tool in viewing FairCom DB threads in process:

Download Process Explorer and install it.
Run Process Explorer and select the ctsrvr.exe or ctreesql.exe process. Right-click and select Properties.
Select the Threads tab. Select the thread that is showing the most CPU use and click the Stack button.
Click copy to copy the stack and send the resulting file to FairCom for analysis if requested.

Here's a screen snapshot showing a typical FairCom DB thread call stack:

Generating Dump Files on 64-bit Windows

Windows Task Manager can generate dump files, which can be analyzed to help diagnose software problems. By default, Task Manager creates 64-bit dumps even if the source is a 32-bit process. This type of dump (a 64-bit dump of 32-bit process) is difficult to debug because only windbg supports them and not all the functionality is available.

An alternative version of Task Manager, taskmgr.exe, is available in the c:\windows\syswow64 folder. This version creates 32-bit dumps of 32-bit processes, which allows for more thorough debugging.

Another utility, procdump, will automatically create the preferred dump format for all processes. To read about this utility and download it, visit:

http://technet.microsoft.com/en-us/sysinternals/dd996900

The usage and parameters are documented at the link provided above. Several parameters useful for debugging include:

-c - CPU threshold at which to create a dump of the process.
-m - Memory commit threshold in MB at which to create a dump of the process.
-t - Write a dump when the process terminates.

For example, to write up to 3 mini dumps of a process named 'consume' when it exceeds 20% CPU usage for five seconds:

    
C:\>procdump -c 20 -s 5 -n 3 consume

Transaction Log Increases

The most likely cause is a transaction that has been active for awhile and one or more other clients are filling transaction logs with their transaction activity. The server must keep the log containing the transaction begin log entry until the transaction either commits or aborts.

Questions to consider

Is there more than one client executing transactions?
Is there one client that has a transaction that doesn't commit for a relatively long time?

Steps to take

Look in CTSTATUS.FCS for messages “The number of active log files increased to ...” and locate the explanation below those messages. It will probably have a message as follows:

Transaction (started in log <log_number>) still pending.
		User# <taskid> |<username>|<nodename>|

If this is the case, identify what client that is and determine what it is doing. Is it expected that the client has a transaction that has been pending for so long?
If there is a different explanatory message than the above, please send CTSTATUS.FCS to FairCom support to examine.

Additional Transaction Log Number Messages

The number of active log files required to support server operation is nominally four (4). A number of conditions require more active log files. The most important of these are:

Increasing CHECKPOINT_FLUSH to delay flushing of buffers associated with committed transactions.
A pending transaction that began several logs ago, and has not yet committed or aborted.
Dynamic dumps.

When the number of log files is about to increase, CTSTATUS.FCS receives a message to that effect. FairCom DB outputs an explanation as to the condition that triggered the increase. When the increase is caused by a pending transaction, we attempt to identify the user ID and node name associated with the transaction.

When the number of log files is not permitted to increase (because of FIXED_LOG_SPACE YES in the configuration information), and if the need for more logs is caused by a pending transaction, the server will disconnect the client associated with the transaction. If the following keyword

COMPATIBILITY NO_TRAN_DISCONNECT

is not part of FairCom DB configuration in ctsrvr.cfg, then the server will attempt to disconnect the client. If the client is not disconnected, and if the client does not make a subsequent server request, then the pending transaction will eventually lead to the server terminating abruptly with error L56. The server terminates as it cannot ensure that a commit or abort will be added to the transaction logs before the log that holds the TRANBEG entry will become inactive. (If the client makes a server request, it will see the transaction attribute that indicates the need to abandon the transaction, and the abnormal shutdown will be avoided.)

Dynamic Dump Restore FMOD_ERR (48)

The FairCom DB Dynamic Dump feature creates 1 GB file extents by default. For dump backups that are at or near this extent threshold, some dumps may generate X number of extents some days, and X-1 on others, depending on the size of the data at the time of the dump.

When restoring these dumps, be sure to not include an extraneous dump file extents that does not belong to the actual dump being restored. A good check is the physical date and time stamp of the file extent, and be sure the files belong together as a group.

An easy way to avoid this problem is to disable the file extent feature:

!EXT_SIZE NO

This is a recommended default dump script option.

FUSE_ERR (22) During Automatic Recovery

Whenever a transaction controlled FairCom DB file is opened, it is assigned a unique "File Id" which is used to reference it in the transaction logs. This number is stored as a 4 byte integer. If large numbers of files are repeatedly opened and closed, the File Ids can be rapidly consumed , leading to a "Pending File ID overflow" message logged in CTSTATUS.FCS. This message is logged every 10,000 file opens. The "File Id" can be reset by shutting down the server cleanly (so no recovery is needed), and removing the transaction logs composed of the L*.FCS and S000*.FCS files.

The FUSE_ERR during recovery is indicating that it needs the FILES setting increased to complete recovery. There is a RECOVER_FILES keyword that controls this specifically.

RECOVER_FILES <number of files | NO>

Newer versions (9.3 and later) only assign "File Id's" when a file is actually updated, and not just opened and read from, which substantially extends the interval before the next overflow.

Activation Failures (Error 26) on AIX 6

A previously activated server was found to not be activated with a newly provided activation key. The fcactvat utility failed with system error 26 "Text file busy or in use".

AIX 6 can cache shared objects, in this case ctreedbs.so, and fcactvat can then not stamp the binary. An AIX 6 utility, slibclean, is available on that platform that releases the object from the cache allowing successful stamping.

CPUs Report Different Times on Linux, Causing Unexpectedly Long sleep() Times

On a multi-core system running CentOS Linux, calls to sleep() were observed to take an unexpectedly long time. It is believed the system time reported by the CPUs varies, as was confirmed by this experiment:

The Linux taskset utility can be used to run a process on specified CPUs. The date on each CPU differed:

taskset -c 0 date
taskset -c 1 date
taskset -c 2 date
taskset -c 3 date
Mon Aug 8 11:20:04 CDT 2011
Mon Aug 8 11:20:19 CDT 2011
Mon Aug 8 11:20:16 CDT 2011
Mon Aug 8 11:20:20 CDT 2011

The Hyper-V VM was used in this case and it was found that adding the following boot options resolved the problem, as described here:

http://hardanswers.net/correct-clock-drift-in-centos-hyper-v

divider=10 clocksource=acpi_pm (for 32bit kernel)

Prevent FPUTFGET LNOD_ERR Error (50) from OpenIFile()

A FairCom DB FPUTFGET application reported numerous LNOD_ERR errors (50, Could not lock node). Changing the fclock setting on the customers computer to a much larger value resolved this unusual error.

Connection and Startup Issues

If replication is not working, check the following:

Is replication enabled for the file? Use the ctinfo utility to check this:

ctinfo yourreplicatedfile.dat  ADMIN ADMIN FAIRCOMS

Look for:

Extended File Mode Details:
	ctREPLICATE      : file is replicated

If replication is not enabled for the file:
1. Check that the REPLICATE option is properly specified in ctsrvr.cfg.
2. Add DIAGNOSTICS REPLICATE to ctsrvr.cfg. When opening a file, a message is logged if replication cannot be enabled for the file.
If replication is enabled for the file:
1. Check that the Replication Agent is running and is properly connected to source and target
2. Check source server transaction log entries. Use the ctrepd utility and/or Replication Agent change log (enable the log_change_details option in ctreplagent.cfg).
3. Check the replication exception log for errors:
  - Did the Replication Agent open the file on the target server?
  - Did applying adds/deletes/updates fail?

If you are using two-way replication or multiple Replication Agents connected to a server, be sure to set the unique_id option in the Replication Agent configuration file so that each Replication Agent has its own unique ID.

If you are doing two-way replication between servers on the same machine, use the REPL_NODEID option in both servers' configuration files to set unique node IDs for the servers.

If you use localhost or the DNS name for source_server or target_server in ctreplagent.cfg, you will need to use REPL_NODEID.

In This Section

Replication and Low-Level Operations - Error 919

Low-level operations are not replicated, so these operations fail on replicated files with error 919, REPL_ERR "Low-level operations are not allowed on replicated files."

You will need to disable replication on the file prior to executing a low-level operation. To disable replication you can use a recent version (V11 or later) of the cttrnmod utility (located in tools/cmdline/admin/client). This requires exclusive access to the file.

To disable replication on a file prior to executing a low-level operation, follow these procedures:

Check if replication is enabled for this file in ctsrvr.cfg via the REPLICATE keyword (either explicitly or via a wild card). You will need to remove this keyword and restart the server to prevent replication from being immediately re-enabled. If this is not convenient, you can temporarily disable transaction logging for a set of files so the keyword has no effect on these files

cttrnmod set P -f <filelist> -u <user> -p <password> -s <server>

The -f <filelist> option specifies the name of a text file containing names of FairCom DB data files, one per line. When this option is specified, the utility operates on all files specified in that text file.

Disable the replicate bit:

cttrnmod set repl=off -f <filelist> -u <user> -p <password> -s <server>

Run your low-level operation on the file.
Re-enable transaction logging (if you disabled it in step 1):

cttrnmod set T -f <filelist> -u <user> -p <password> -s <server>

Re-enable replication:

cttrnmod set repl=on -f <filelist> -u <user> -p <password> -s <server>

If you removed the REPLICATE keyword, you can restore that now and restart the server again.

To restart replication on this file, you should generally do a full re-sync of all replicated files. If your replication target is fully caught up with the source server it is safe to re-sync only this file.

gdb Remote Debugging

This section shows an example of how to remote debug using gdb, the GNU Project Debugger, on a different system architecture. In this example:

Host=x86 Linux
Target =ARM linux

The gdbserver must be compiled and run on the target system. The gdb must be specially compiled to be aware of the target architecture.

See: https://sourceware.org/ml/gdb/2005-02/msg00074.html

To build gdb with auto-detect host (x86) and ARM-Linux target:


$ cd gdb-6.3
$ ./configure --target=arm-linux
$ make
$ file gdb/gdb
gdb/gdb: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for
GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped
$ cd gdb/gdbserver
$ export CC=/usr/local/bin/arm-linux-gcc
$ ./configure --host=arm-linux
$ make
$ file gdbserver
gdbserver: ELF 32-bit MSB executable, ARM, version 1 (ARM), for GNU/Linux
2.4.3, dynamically linked (uses shared libs), not stripped

Copy gdbserver to the target and run your program on the target using TCP/IP. This alternate syntax can be used for the COM port:

# gdbserver host:2345 ./ctsrvr

On the host, start your special gdb version and issue the following commands:


#local copy of binary to debug
(gdb) file ./ctsrvr
#path to local "root" for resolving system libraries with absolute paths
(gdb) set sysroot /usr/local/opt/crosstool/arm-linux/gcc-3.3.4-glibc-2.3.2/arm-linux
#path to any local copies of libraries using relative paths
(gdb) set solib-search-path /usr/local/opt/crosstool/arm-linux/gcc-3.3.4-glibc-2.3.2/arm-linux/lib
#attach to remote process host:port
(gdb) target remote ts7200:2345

"ctntio error" Entries in Status Log File CTSTATUS.FCS

SYMPTOMS

Entries such as the following line appearing in a FairCom DB status log file CTSTATUS.FCS:

 - User# <taskid>	ctntio: read error – O<taskid> bytes=0 pErr=127
	|<userid>|<nodename>|: 161
 - User# <taskid>	ctntio: send error – O<taskid> bytes=0 pErr=128
	|<userid>|<nodename>|: 161

where:

<taskid> is the task ID assigned to the FairCom DB thread that logged the error message
<userid> is the user ID for the thread that logged the error message
<nodename> is the node name for the thread that logged the error message

CAUSE

When a communication error occurs, the FairCom DB logs a ctntio error message in the status log file CTSTATUS.FCS. The most common cause of the “ctntio: read error” message is that a client process terminated without first disconnecting from FairCom DB. A “ctntio: send error” message can occur if the client process terminates while FairCom DB is attempting to send a response to the client process.

One common way that a client process terminates without first disconnecting from FairCom DB is that a user turns off his machine without first properly logging out of the application.

The following are other possible causes of ctntio errors in CTSTATUS.FCS:

Physical network problems
An overworked network transport layer that is timing out and doing retries

RESOLUTION

When investigating the cause of ctntio errors, first check for the most common cause: a client process terminated without properly disconnecting from FairCom DB. Note that the application can set the node name after connecting to FairCom DB by calling the SetNodeName() FairCom DB API function. Setting a descriptive node name (including details such as process ID, thread ID and client machine name or IP address) can help you associate ctntio error messages with the corresponding client process.

If you rule out the most common explanation, check if the client application is also getting errors such as ARQS_ERR (127), could not send request) or ARSP_ERR (128), could not receive answer) when calling FairCom DB functions. In that case, check for possible network problems as follows:

Ensure FairCom DB's host machine is not burdened beyond its capacity. Using a more powerful machine or limiting the number and types of applications on a machine can improve performance and limit errors at the communication level. Also, ensure no specific application is over-using resources on the host machine.

If you find that the ctntio errors are due to heavy network activity, increasing the priority of a FairCom DB process can eliminate or reduce the occurrence of ctntio errors. This should be done cautiously as it will affect other applications running on the same machine.

The error messages in the status log can be turned off, but unless they are an inconvenience, this is not recommended. The messages serve as a good health check on the state of your network and may be an early warning of more serious network and system problems. To disable the messages, add CTSTATUS_MASK VDP_ERROR to the ctsrvr.cfg configuration file and restart FairCom DB.

MORE INFORMATION

To provide a little more context, the following a short high level description of how FairCom DB's client/server communication operates.

FairCom DB communication is initiated by the client process: the client makes the connection, the client sends the request, and the client normally terminates the connection. FairCom DB is only a "listener and responder".

In a FairCom DB process, there is a thread for each client attached to FairCom DB which is waiting on a blocking read until the next request from the client. Effectively it waits forever.

If the operating system determines that the communication channel is invalid for whatever reason (crashed client, broken network connection, overloaded network layer that cannot handle messages in a timely manner), the blocking read is released. When the blocking reads returns, the server attempts to read data from the communication channel but there is none. Based on the error code returned by the socket read operation, the thread can determine if it should retry the read. If retry is not an option or the maximum number of retries has been reached, the ctntio read error message is written to the status log file CTSTATUS.FCS and the client session is marked for termination and cleanup. This is not a fatal error to FairCom DB since it is able to abort any open transactions for the user and data integrity is assured. The effect should be more apparent on the client side because it can make no further requests to FairCom DB unless it reconnects.

"WARNING: ct_lflsema livelock" Entries in Status Log File CTSTATUS.FCS

SYMPTOMS

Entries such as the following lines appearing in FairCom DB's status log file CTSTATUS.FCS:

WARNING: ct_lflsema livelock
Log flush from buffer overflow. Current LFW Channel block not set: 1

CAUSE

This condition occurs when the COMMIT_DELAY logic is enabled and FairCom DB is trying to acquire a lock on the log flush semaphore (ct_lflsema). If a large number of retries does not lead to either acquisition of the semaphore or the flushing of the log, then a warning is posted in the status log about a possible livelock problem, and the commit delay logic is automatically disabled. The consequence of disabling the commit delay logic could lead to a drop of performance.

A possible condition that may trigger a "ct_lflsema livelock" warning, may occur during a log extension operation of a large transaction log files (L*.FCS). If the operation takes too long to complete, it may cause an excessive looping that in turn produces the warning situation.

RESOLUTION

If the cause of the warning is latency during log extension, you have a number of options to decrease the latency:

Transaction Log Template feature. With the Transaction Log Template feature enabled, new empty log templates are created at server startup to serve as a log template. Whenever a new log is required, the corresponding blank log file is renamed to LXXXXXXX.FCS instead of being created from scratch.
Decrease the size of transaction log files using the configuration keyword LOG_SPACE.
Upgrade the underlying hardware and/or software file system where the transaction log files are stored.

MORE INFORMATION

It is impossible to know if a livelock detection represents a deadlock scenario, or simply a timing issue that would have cleared if the limit on the retry loop counter was increased. The limit is set by a #define and increasing such value requires a customization of FairCom DB.

For additional information about the Transaction Log Template feature, please read the Transaction Log Template White Paper.

Disappearing FairCom DB Core Files on Linux

Your FairCom DB process unexpectedly dies. FairCom asks to see your core file for analysis. You look and it’s not there! This may occur for any of the following reasons:

Not enough memory.
You ran out of drive space.

The Redhat daemon ABRT, found on modern Redhat Linux systems, sets limits, in addition to ulimits, for core size, and and other problem application information. You should become familiar with this very useful utility suite. More importantly, ABRT is integrally tied to the Redhat package management system for automatic bug reporting. As such, this tool may remove unknown core files as they are dropped.

Why does the abrtd daemon delete recently created application core dumps?

Look for messages such as the following in your system logs:

Dec 12  3:19:22 hostname abrtd: Directory 'ctreedbs-13412346-1725' creation detected
Dec 12  3:19:22 hostname abrtd: Executable '/home/FairCom/servers/bin/ace/sql/ctreesql' doesn't belong to any package
Dec 12  3:19:22 hostname abrtd: Corrupted or bad crash /var/spool/abrt/ctreedbs-13412346-1725 (res:4), deleting

This is the case for applications not installed via the Redhat package management tool. System administrators should consider adding FairCom DB to the accepted list of applications to monitor. This is done with the following options In your /etc/abrt/abrt.conf configuration file:

ProcessUnpackaged = yes/no
This directive tells ABRT whether to process crashes in executables that do not belong to any package. The default setting is no.
SaveBinaryImage = yes/no
This directive specifies whether ABRT's core catching hook should save a binary image to a core dump. It is useful when debugging crashes which occurred in binaries that were deleted. The default setting is no.

FairCom advises to also configure your ulimit to allow unlimited core files (-c) (within storage space availability, of course).

Linux systems using systemd must also set the following in /etc/systemd/coredump.conf

Storage=external
ProcessSizeMax and ExternalSizeMax need to be set larger than the virtual memory size of the faircomdb process or the resulting core file may be truncated and unusable.

FairCom DB Memory Use and glibc malloc per-thread Arenas

During testing and debugging, it has been observed that newer Linux versions have produced noticeably larger core files than would be expected from memory usage statistics. A core file is normally close to the size of the process’s working memory space. A bit of research has turned up a somewhat surprising finding many of our Linux users, glibc users in particular, should be aware of.

The C runtime (GLIBC >= 2.10) now allocates a new heap for each thread on its first memory allocation call, and each heap is 64 MB. This behavior can be reproduced with a small test program. Create a thread and have it sleep without calling malloc(), and process memory use increases only by the thread’s stack size. However, if your thread calls malloc(), process memory use increases by about 64 MB. This is by design with newer versions of glibc (>=2.10).

Malloc per-thread arenas in glibc

This change was introduced for scalability purposes. Allocating a separate heap for each thread reduces contention between the threads up to a limited number of threads. The creation of new heaps is limited to 8 times your CPU core count (64-bit systems) or 2 times your CPU core count (32-bit systems).

You can modify this behavior via an environment variable visible to your process. Set MALLOC_ARENA_MAX to 1 before starting FairCom DB, then only one heap is created for your process, and observed memory use appears as may be expected, that is, process size plus any memory allocations. This must be done before the first malloc() call.

It is also possible to call mallopt() to change MALLOC_ARENA_MAX directly in your application. Example:

#include <malloc.h>
...
mallopt(M_ARENA_MAX, 1);
...

FairCom engineers are studying whether to include a FairCom DB configuration option and allow modifying this value such that it is tunable for specific applications. If performance profiling indicates appreciable gains can be found, look for this change in a future FairCom DB release. In the meantime, you may wish to explore how this impacts your specific FairCom DB database applications.

How to Reproduce a Problem Using TRAPCOMM.FCS

The DIAGNOSTICS TRAP_COMM keyword instructs the FairCom Server to log incoming communications packets to a file called TRAPCOMM.FCS prior to execution. If a copy of the initial data and index files are preserved, this log can be played back using the cttrap utility and a debug build of the FairCom Server to observe the results of the client requests. This allows client activities to be exactly duplicated and repeated.

The following is a step-by-step procedure on how to produce a TRAPCOMM.FCS file.

These instructions are intended for a FairCom Server operator or administrator who is experiencing a problem and would like to help FairCom Support by creating a reproducible case. Please follow these steps:

Cleanly shut down the FairCom Server and remove the *.FCS files (you may want to move them to a backup directory).
Back up all the data you are going to access.
Start the FairCom Server with DIAGNOSTICS TRAP_COMM in ctsrvr.cfg.
Start your client applications and reproduce the error.
Shut down the FairCom Server.

At this point you should have a TRAPCOMM.FCS file that contains all the client requests sent to the server. You can use this file to re-play the requests with the cttrap utility.

To replay the TRAPCOMM.FCS please follow these steps:

Cleanly shut down the FairCom Server and remove the *.FCS files.
Restore the data that you backed up before running the TRAP_COMM instance.
Start the FairCom Server without the DIAGNOSTICS TRAP_COMM in ctsrvr.cfg.
Run cttrap passing the TRAPCOMM.FCS file path as command-line argument.

See Also: