Troubleshooting Replication

These tips can help you troubleshoot some common replication issues.

Monitoring Replication Status

To check the status of a replication process, select Administration > Replication > Code Replication or Administration > Replication > Data Replication and find it in the list.

.

Replication Logs

The replication process records log files on both the source and target systems, separately from the regular error logs. They exist in https://instance_address/on/demandware.servlet/webdav/Sites/Logs/, with filenames like staging-blade_name-appserver-yyyymmdd.log To view the log files in Business Manager, select Administration > Site Development > Development Setup > Log Files.

Note: All the log file names contain "staging", regardless of the instance type.

Hung Replication

Some database transactions, especially those involving catalog data, can take a while to complete depending on the amount of data involved. If the replication remains in the running state for longer than expected, you can check whether it is hung.

  1. Open the most recent replication log on the staging instance.
    • Check whether it contains the line "Staging pipeline in live system successfully called." If it doesn't, there is a problem.
    • Check whether it includes an entry that a state is set to "ErrorAcquiringEditingLocks." If so, it’s possible that resource locks from a previous replication process were not released, which can hang the replication.
  2. Open the most recent replication log on the target instance, and scroll to the end.
    • Refresh the view a few times to determine whether new entries are being added. If no new entries appear after a while, the replication can be hung.
    • Check whether it includes an entry that a state is set to ErrorAcquiringLivelocks. If so, it’s possible that resource locks from a previous replication process were not released, which can hang the replication.
    • If the last log entry is a database action (for example, INSERT or ALTER INDEX), check earlier logs. Examine how long that action usually takes and what log entries usually follow it.
    • If the last log entry starts with Rsync, the delay can be due to a large number of changed static content files. Files that have been moved to a different folder are included, even if their content is the same. If the Rsync is stuck, contact Support to check its status.
    • If the log shows the state ErrorLiveStagingProcessKilled, the replication is probably hung due to a concurrent deployment or instance restart.
  3. If either log contains a line with something similar to "resource busy and acquire with NOWAIT specified," open a ticket with Support. In the ticket, provide the troubleshooting steps that you have attempted.
  4. If the replication process shows completed on the target instance but is still waiting or in progress on staging, check the staging instance. It can have been down when the replication finished. Restart the staging instance, and check the status again.

If you determine that the replication is hung, use Control Center to restart the staging instance. Make sure that the hung replication has stopped by verifying that its status on the staging instance is Failed. When it has stopped, rerun the replication.

If the replication hangs again, you can try restarting the target instance, then restarting the source instance, then rerunning the replication. However, restarting the target instance disrupts all running jobs, returns errors for all storefront requests, and clears all caches. Restart a production instance only as a last resort.

If the replication still hangs, open a Support ticket and provide the troubleshooting steps that you have attempted.

Post-Replication Issues

If you encounter a problem after a data replication operation, take the following steps.

  1. Examine the replication logs on the staging and target instances for error messages. Look for entries that include "failed" or "ORA-". These messages can help determine the problem.
  2. If the replication logs do not provide helpful data, check the error logs.
  3. Try to isolate the replication task that caused the problem by running tests with and without each task.
  4. If replication of multiple objects fails, try replicating individual objects to narrow the cause.
  5. If a scheduled replication did not run, try to run it manually.