• Main INDEX
  • Monthly INDEX
  • PREV
  • NEXT
    Make New Entry, Make Followup Entry

    User name R. Michaels

    Log entry time 09:10:28 on February 13, 2011

    Entry number 346637

    This entry is a followup to: 346627

    Followups:

    keyword=end-run script problems

    I'm at home and not working on fixing the end-run script at the 
    moment but this problem has occurred before and here are the likely
    reasons and ideas for solutions, and I'm sorry it was never solved
    in a robust way.
    
    I believe it's a timing problem and probably needs a re-design of
    how the scripts interact.
    
    At the end of the run a script "end_clean" is called which kills off
    various logging like scalers (event type 140) fastEpicsLogger, etc.
    This script is also called in getruninfo_bgr at the start of the run,
    in case previous run did not end properly.  So normally end_clean at
    the start of a run would do nothing, but if the last run crashed then
    it cleans up.  All this is good, but the problem I think is with killing
    caget at the end, because this can crash / abort the epicsEndRun
    if epicsEndRun had not finished yet -- e.g. if it is glacially slow, 
    which often happens with ambitious new groups who decide they must 
    record a huge amount at the end of a run.
    
    There are various ways one can think of making this design better
    so that logically there's no way epicsRunEnd is aborted before
    the line "$HALOGCOM $halogfile "$keyword", which is what is happening.
    
    Two things we've done about this historically:
    
    1) Make epicsRunEnd *fast* (~ 1 sec).  This always helps a lot.
    Want a kitchen sink of variables recorded ?  Put them into the script
    epicsRunStart, or put into epicsLogger (and *not* into epicsRunEnd
    nor fastEpicsLogger). There is no need to keep a million variables 
    at the end of a run.
    
    2) In Dec 2010 there was a somewhat pathetic attempt to ameliorate 
    this by putting a "sleep 12" near the end of end_clean, just before
    killing caget.  It think it helped, but it's not a robust solution.
    
    See also  
    
    halog 344788   and the follow-ups.
    
    Who will work on this ?  Larry, the current script caretaker ?
    If he solves it in a robust way it will be a lasting contribution,
    since at some level this has been a non-stop pain for Hall A,
    though not so bad for some experiments.