NEXT
Make New Entry,
Make Followup Entry
User name R. Michaels
Log entry time 09:10:28 on February 13, 2011
Entry number 346637
This entry is a followup to: 346627
Followups:
keyword=end-run script problems
I'm at home and not working on fixing the end-run script at the
moment but this problem has occurred before and here are the likely
reasons and ideas for solutions, and I'm sorry it was never solved
in a robust way.
I believe it's a timing problem and probably needs a re-design of
how the scripts interact.
At the end of the run a script "end_clean" is called which kills off
various logging like scalers (event type 140) fastEpicsLogger, etc.
This script is also called in getruninfo_bgr at the start of the run,
in case previous run did not end properly. So normally end_clean at
the start of a run would do nothing, but if the last run crashed then
it cleans up. All this is good, but the problem I think is with killing
caget at the end, because this can crash / abort the epicsEndRun
if epicsEndRun had not finished yet -- e.g. if it is glacially slow,
which often happens with ambitious new groups who decide they must
record a huge amount at the end of a run.
There are various ways one can think of making this design better
so that logically there's no way epicsRunEnd is aborted before
the line "$HALOGCOM $halogfile "$keyword", which is what is happening.
Two things we've done about this historically:
1) Make epicsRunEnd *fast* (~ 1 sec). This always helps a lot.
Want a kitchen sink of variables recorded ? Put them into the script
epicsRunStart, or put into epicsLogger (and *not* into epicsRunEnd
nor fastEpicsLogger). There is no need to keep a million variables
at the end of a run.
2) In Dec 2010 there was a somewhat pathetic attempt to ameliorate
this by putting a "sleep 12" near the end of end_clean, just before
killing caget. It think it helped, but it's not a robust solution.
See also
halog 344788 and the follow-ups.
Who will work on this ? Larry, the current script caretaker ?
If he solves it in a robust way it will be a lasting contribution,
since at some level this has been a non-stop pain for Hall A,
though not so bad for some experiments.