end-run script problems

Main INDEX

Monthly INDEX

NEXT
Make New Entry, Make Followup Entry

User name R. Michaels

Log entry time 09:10:28 on February 13, 2011

Entry number 346637

This entry is a followup to: 346627

Followups:

346647 02/13/11 10:26 R. Michaels tun of kill of caget in end_clean (test)

keyword=end-run script problems

I'm at home and not working on fixing the end-run script at the 
moment but this problem has occurred before and here are the likely
reasons and ideas for solutions, and I'm sorry it was never solved
in a robust way.

I believe it's a timing problem and probably needs a re-design of
how the scripts interact.

At the end of the run a script "end_clean" is called which kills off
various logging like scalers (event type 140) fastEpicsLogger, etc.
This script is also called in getruninfo_bgr at the start of the run,
in case previous run did not end properly.  So normally end_clean at
the start of a run would do nothing, but if the last run crashed then
it cleans up.  All this is good, but the problem I think is with killing
caget at the end, because this can crash / abort the epicsEndRun
if epicsEndRun had not finished yet -- e.g. if it is glacially slow, 
which often happens with ambitious new groups who decide they must 
record a huge amount at the end of a run.

There are various ways one can think of making this design better
so that logically there's no way epicsRunEnd is aborted before
the line "$HALOGCOM $halogfile "$keyword", which is what is happening.

Two things we've done about this historically:

1) Make epicsRunEnd *fast* (~ 1 sec).  This always helps a lot.
Want a kitchen sink of variables recorded ?  Put them into the script
epicsRunStart, or put into epicsLogger (and *not* into epicsRunEnd
nor fastEpicsLogger). There is no need to keep a million variables 
at the end of a run.

2) In Dec 2010 there was a somewhat pathetic attempt to ameliorate 
this by putting a "sleep 12" near the end of end_clean, just before
killing caget.  It think it helped, but it's not a robust solution.

See also  

halog 344788   and the follow-ups.

Who will work on this ?  Larry, the current script caretaker ?
If he solves it in a robust way it will be a lasting contribution,
since at some level this has been a non-stop pain for Hall A,
though not so bad for some experiments.