Difference between revisions of "HV HowTo for Experts"

From Hall A Wiki
Jump to: navigation, search
Line 144: Line 144:
 
5. Restarting the servers.   
 
5. Restarting the servers.   
  
This seems to be our biggest issue at the moment, and mainly on the R-HRS.  The Java GUI loses connection and/or emits alarms.
+
This was a big problem for awhile, but I think it has largely gone away (see 8).  The symptom is that the Java GUI loses connection and/or emits false alarms.
  
 
On ahut* the servers are run from the ahut account.  If you need to restart:   
 
On ahut* the servers are run from the ahut account.  If you need to restart:   

Revision as of 09:53, 15 April 2014

High Voltage in Hall A

First of all, please be aware of the simple instructions for users:

https://hallaweb.jlab.org/wiki/index.php?title=HowTo_for_Users

Overview of Architecture

The HV crates sit in the hall, e.g. on the detector stacks. A set of cards with (usually) 12 channels of negative or positive HV are inserted into the crates. A custom "serial board" (built by Javier Gomez and Jack Segal) talks to the cards. This "serial board" replaces an old, obsolete motherboard. (There are still a few crates with this motherboard, however -- e.g. the beamline crate.) A Perl "Shim" server (written by Brad Sawatzky) runs on a PC nearby the HV crate. The "Shim" server uses (via the Perl "Expect" module) a low-level C code written by Javier to talk to his serial card in the HV crate. On the User end, a Java GUI (written by Roman Pomatsalyuk) displays the HV information and provides User control. This Java GUI talks to the Shim server. Alternatively, the Java GUI can talk to the motherboard via a Portserver.

The portserver/motherboard alternative is being phased out, but still exists in two places at the moment: the beamline HV and one of the HV crates in the test lab DVCS setup, as I write this (Jan 2014).

All the elements in this chain must be working in order to have control and readback of HV.

There are three ways to control HV:

1. Using the Java GUI and the software chain mentioned above.

2. Using the "i2lchv_linux_XXX" C code in shim/LecroyHV_shim/LecroyHV_FE. At the moment, XXX = bob on ahut1,2 and XXX = S2 on intelha3 and XXX = S0 on halladaq8

3. hvctrl code -- explained in a section below.

Existing Crates

Here is a list of HV crates in Hall A and the Test Lab as of Jan 2014.

The crates that use portservers are talking to the old motherboard. The crates that use a PC with Shim have no motherboard. See Architecture above.

Location                Portserver, or PC for Shim            Config on ~/slowc     How to start on adev

Left HRS (1 crate)      Intel PC: intelha3                       LEFT               cd ~adev/slowc ; ./hvs LEFT

Right HRS (top crate)    laptop: ahut1                           RIGHT 
 
Right HRS (bottom crate)  Intel PC: halladaq8                    RIGHT              cd ~adev/slowc ; ./hvs RIGHT  
                                                                                    (this starts both R-HRS crates)

Beamline                portserver hatsv4: 2003                  BEAMLINE           cd ~adev/slowc ; ./hvs BEAMLINE

Test Lab                1 crate on portserver                    DVCS               cd ~/slowc ; ./hvs DVCS
                        1 crate on a PC                                             (see the README file)

Where the Servers are

The Intel PCs share the following which becomes their root partition /root/diskless/i386/Centos5.8/root/shim/LecroyHV_shim and the low-level C code is in ./LecroyHV_FE

Software that runs on the Intel PCs must, however, run from their local disk because output is not permitted on the root partition.

A cron script ensures that the server is running [root@intelha3 ~]# crontab -l

  1. Start the shim server for HV

2,10,20,30,40,50 * * * * /shim/scripts/prepHV

On the "ahut" laptops, it's a bit different. See /home/ahut/shim/LecroyHV_shim. An ahut* a cron script (prepHV) runs under ahut to ensure the HV server is running.

On the DVCS computer in the test lab, you have to start everything by hand. Go to ~/slowc and see the README file.

Restarting the Servers

The biggest problem we have at the moment is that the Java GUI becomes disconnected from the servers and/or there are alarms related to this disconnect. Here I explain how to reconnect.

On ahut* the servers are run from the ahut account. If you need to restart, first make sure the old server is not running

[ahut@ahut2 ~]$ ps awx | grep -i shim

and "kill -9" the PID. Then it will either restart on it's own within a few minutes (there's a cron job), or you can restart by hand:

/home/ahut/scripts/startHV

On the intelPCs we do not (yet) have an ordinary account -- only root. The simplest way to restart the server is to reboot the intelPC. However, if you know the root password, you can follow a similar procedure as above to kill and restart the HV, i.e. killing the "shim" server and restarting it. The HV is started on intelPC with a command /shim/scripts/prepHV which runs as a cron job under root.

hvctrl standalone local control

Note: Although Java GUI is preferred because of SAFETY, I'll explain here an alternative way to control HV. It cannot control every parameter yet -- only the basic ones : turning on/off a card, enable/disable a channel, and read/write a voltage value.

On the ahut computers, go to /home/ahut/hvctrl

On the IntelPC it's a little different. Use /hvctrl/hvctrl. The input/output files are on /root/hvctrl. These have to be different because /hvctrl is part of a disk area shared by all IntelPCs which, however, is not writeable, while /root/hvctrl is a writeable area which is local to each IntelPC.

If you run the code "hvctrl" it's usage is self-explanatory and there is a README file, normally, in those directories. The code talks to the HV crate to which that PC is connected via a serial cable. If we like this code we may expand on it. I find it useful for quick tests. However, since not every parameter can be modified, and since it doesn't have a GUI, it's not as good as the Java GUI control.

Setting Up A New Installation

If you install on an Intel PC in Hall A, note that these share a root partition, so they all see the software. Suppose, however, that you want to install on a new PC like a laptop. That's what these instructions are for.

1. Need Redhat 5 or greater in order to have proper glibc.

2. Needed Perl 5.8.8 or later install. On intel PC this appears as /perl after being put (on adaql1) in the shared area.

3. Set write permission for /dev/ttyS0 (or is your PC using /dev/ttyS2 ?) Typically the permissions gets reset when the PC is rebooted, such that users cannot write there. A wrong write permission causes a silent failure of the software.

4. Need telnet server because the network connection is via telnet

Install telnetd server by typing "yum install telnet-server" and allow telnet as follows: /etc/xinetd.d/telnet needs "disable no" and you need to restart xinetd /sbin/service xinetd restart

Note, if telnet mysteriously stops working ... Kill cfengine, which is Computer Center's rather rude security script that deletes /etc/xinetd.d/telnet. Yes, telnet is an old, insecure protocol, but it's needed by this software. The Java code uses telnet to communicate with the Perl server.

On ahut1 or ahut2, I have a cron script /root/scripts/prepHV which takes care of running the server automatically. It also periodically restores the "telnet" file mentioned above, and periodically restarts xinetd

Here is a simple test of the telnet server: If you are on, for example, ahut2 and can "telnet ahut2" (i.e. telnet into yourself). Then yes, the server is running.

5. Install the software. Assuming you don't need the Java code because it runs on adaql1 in the counting room, there are two pieces: the Shim perl server and the low-level frontend C code, see "Architecture" above. You'll need to find the tar file for this, which is in the MSS.

/mss/home/rom/hv_ahut2_14apr14.tar

You should always pick the latest date (the date is usually written in the filename) because I tend to update / improve things.

What can go wrong ?

Common troublshooting items are listed in the User Guide (search for troubleshooting)

https://hallaweb.jlab.org/wiki/index.php?title=HV_HowTo_for_Users

Below is a list of other problems I've seen and their solution.

1. Power-cycling a HV crate has been known to help. Especially if more than one HV card does not respond. Then it's probably a crate problem not a card problem.

2. Cannot login to intelPC as root. Probably because you don't have /root, in which case the solution is to mount it by doing the following as superuser on adaql1: "cd /root ; ./mount_diskless". (but I think this has been fixed now)

3. No connection to server. Try restarting the server. Instructions in step 5.

4. If you try to run the Shim software by hand you might see "Can't connect to mainframe" and the Shim script silently dies ! This is a good one. The problem is that on IntelPCs the /shim directory where the software exists is not writeable, so you must run from /root/shim which is a local writeable disk area.

5. Restarting the servers.

This was a big problem for awhile, but I think it has largely gone away (see 8). The symptom is that the Java GUI loses connection and/or emits false alarms.

On ahut* the servers are run from the ahut account. If you need to restart:

[ahut@ahut2 ~]$ ps awx | grep -i shim

and "kill -9" the PID. Then it will either restart on it's own within a few minutes (there's a cron job), or you can restart by hand:

/home/ahut/scripts/startHV

On the intelPCs we do not (yet) have an ordinary account -- only root. The simplest way to restart the server is to reboot the intelPC. However, if you know the root password, you can follow a similar procedure as above to kill and restart the HV. The HV is started on intelPC with a command /shim/scripts/prepHV which runs as a cron job under root.

6. Another possibility is that the mainframe really cannot be connected to. First thing to try is to run, by hand, ./shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux* depending on the computer (* = _bob for ahut, = S0 on halladaq8, = S2 on intelha3). If that runs ok, the HV crate is on and is talking. If not, then check your cabling and power status.

7. On Feb 20, 2014 I found I could not raise HV on any of the 4 cards in a particular crate. Power cycling, pulling out cards, reseating them -- nothing helped. I tested the 4 cards in another crate and they worked. Some halogs about this. Finally, I put the cards back into the original crate but in different slots, and they worked again -- leading to much speculation about bad slots, bent pins, poor contact, possible temperature effects (meaning it will happen again ?).

8. On Feb 25, 2014 it was reported that cards 4, 6, and 9 on upper HV crate on R-HRS did not appear in the list, despite repeated reboot of the server. What I did to recover: a) save HV settings b) turn off HV in software c) turn off HV in hardware (two switches) d) turn on HV again. The cards came back. I admit this is not very satisfying, though. Later we found that the "fast" version of software runs fast but eventually causes cards to be disabled, such that only a power-cycle restores them. The "slow" version that runs on an IntelPC works reliably for many days, so we've reverted to that version on all platforms. Slow, but steady.