• Main INDEX
  • Monthly INDEX
  • PREV
  • NEXT
    Make New Entry, Make Followup Entry

    User name Ole Hansen

    Log entry time 13:59:38 on November 30, 2008

    Entry number 251485

    This entry is a followup to: 251463

    keyword=Replaced adaql10 CPU fan

    One of the two CPU fans in adaql10 (the one towards the back of the chassis) had stopped working. The machine shut down whenever the CPU temperature reached the trip point (I guess around 75C). Without the fan, the CPU was at 53C when idle, and the temperature would go up quickly when put under load. Since adaql10 is mostly idle, the problem would show up intermittently, whenever something (e.g. an analysis job) put some load on the CPUs.

    I replaced the bad fan with the spare I happened to have. I'll order more spares, just in case.

    adaql10 should be fully reliable again, and you can return to writing the analysis output to adaql10/work1.

    The fan speeds and CPU temperatures can now be monitored in software on adaql10 using the "sensors" command (any user can do that). The temperatures reported under the headings "k8temp-pci-00cb" and "-00c3" should both read in the lower 30 C range. The fans speeds are the ones reported as "fan1" and "fan2" from the "w83627hf-isa-0290" adapter and should both be around 5100 rpm.

    The syslog entries regarding temperatures refer to the hard disks and can be ignored (they are routine monitoring output of smartd). I am aware of the "self-test" errors - one of the disks, while still working, seems to have become marginal. It will be replaced at the next downtime. Since this disk is mirrored, there would be no data loss even if this disk failed. No need to get nervous.



    A copy of this log entry has been emailed to: rom@jlab.org