Log entry time 00:14:37 on January21,2001

Entry number 55334

keyword=HOWTO get out of a CODA crash

At the end of the run #1586, CODA failed to terminate the run with ROC1,
ROC2, ROC14 apparently hanging. Tried to reboot the ROCs per hand and
do the kcoda-runcontrol-configure-donwload sequence, but it did not work.
Called Bob to fix the situation remotely. There is a well defined procedure
of bringing CODA into a clean state.

1) If the crash seems to be caused by a single ROC crash, this ROC has to
be rebooted. After reboot, type `i' at the ROC prompt. The process called
`coda_roc' should be in the PEND(ing) or READY state.
Then try to `reset', `configure' and `download' the trigger and start a new run.
In many cases, this should be sufficient.

2) If this does not help, reboot the relevant ROC, exit CODA, do a `kcoda'
and take care that all related processes are dead. Also, check for the
file called /tmp/et_sys_tst1. If it's there, delete it. Then start runcontrol,
connect to server, configure the trigger to `twospect', then `reset', `configure'
again, and you should be able to start the run.

3) It can happen that the connection to several ROCs seems lost.
It can then be a good idea to reboot ALL of them. In addition,
if you were too eager to shoot off CODA-related processes, also the
component-xterms can be gone. In this case, open an xterm in the `components'
workspace and say `setupxterms'. This sets up 6 xterms for the ROCs.
Close the ROC3 window since it is not used in this experiment.
Then login into each one of the five ROCs using names on the window titlebar:

telnet hallasfi1 / hallasfi2 / hallavme1 / halladaq1 / halladaq4

When their prompts show up, type `i' to see if the `coda_roc' processes
are in the PEND(ing) or READY state. If so, start runcontrol, connect to
server, configure the trigger, `reset', `configure' again, and you should be
able to start the run.

User name S. Sirca

Log entry time 00:14:37 on January21,2001

Entry number 55334