• Main INDEX
  • Monthly INDEX
  • PREV
  • NEXT
    Make New Entry, Make Followup Entry

    User name W. Deconinck

    Log entry time 10:47:43 on October 28, 2009

    Entry number 296883

    keyword=Power outage: lessons learned

    There was a "lessons learned" meeting about the power outage this morning
    at MCC. It was scheduled after the first power outage, but became even
    more relevant yesterday. Many issues with the procedures were identified
    (but nothing major). For the halls, these are the relevant ones:
    - pager system did not seem to work properly for many of the OPS
    out-pages, secondary contact information is important
    - some people were paged in and sat idle until a while after power
    restored because of access restrictions and network restoration
    - ARMs were in short supply, extra operators should be called in when
    this happens
    - there should have been a notification on the switch-back from the
    generator to the 40MVA power
    - a lot of time was wasted on writing atlist entries, or they were not
    written at all, probably a baseline 'power recovery' atlist will be
    developed with other atlists needed for task beyond its scope

    In particular for Hall A, I think the following could have been useful:
    - flow chart with actions to take when power is out (admittedly, that
    would be an almost empty page and depend on experiments: page RC, page
    target, page tech), after the first power outage the target-on-call was
    not notified which probably lead to our target fan motor failure
    - we should keep a local copy of the 'staff' database with pager number
    in case the network to the main site goes down too (it didn't this time)
    - the halog should stay functional while power is off; it was available
    but new entries were only posted after a while though they were available
    in the preview section. The OPS elog was much better in this respect.
    - the procedure for access to the halls will be improved, ODH risks need
    to be considered, e.g. if only the dome reads O2 < 18% does that mean the
    hall has to stay closed completely? PSS might be put on the generator
    instead of just locking down access. More ARMs will be made available
    for critical hall work.
    - instructions and check lists for when power comes back on, e.g. check
    on necessary DAQ services, check on VME and HV crates if possible,
    basically information the shift crew can collect to alleviate the task of
    the experts and to make sure we don't overlook anything until we start
    - reboot DAQ and computers a while after power is back, to ensure that
    they are picking up services (nfs, nis) that might still have been down
    right after power was restored
    - while beam is being restored, attempt a complete start-up procedure:
    ramping magnets, moving target, setting HV, taking pedestal runs


    A copy of this log entry has been emailed to: rom, reimer, meekins