Difference between revisions of "HV HowTo for Experts"

From Hall A Wiki
Jump to: navigation, search
(hvctrl standalone local control)
(Setting Up A New Installation)
Line 89: Line 89:
 
Here is a simple test of the telnet server: If you are on, for example, ahut2 and can "telnet ahut2" (i.e. telnet into yourself). Then yes, the server is running.
 
Here is a simple test of the telnet server: If you are on, for example, ahut2 and can "telnet ahut2" (i.e. telnet into yourself). Then yes, the server is running.
  
5. Install the software.  Assuming you don't need the Java code because it runs on adaql1 in the counting room, there are two pieces:  the Shim perl server and the  low-level frontend C code, see "Architecture" above.  You'll need to find the tar file for this, which is in the MSS.
+
5. Install the software.  Assuming you don't need the Java code because it runs on an adaq machine in the counting room, there are two pieces:  the Shim perl server and the  low-level frontend C code, see "Architecture" above.  You'll need to find the tar file for this, which is in the MSS.
  
/mss/home/rom/hv_ahut2_14apr14.tar
+
For the rPI, the minimum installation kit is in /mss/home/rom/rpi_minimal_9sep14.tar but there are more complete forms of this which have
 +
Perl, etc.  Ask me if you need that.
  
 
You should always pick the latest date (the date is usually written in the filename) because I tend to update / improve things.
 
You should always pick the latest date (the date is usually written in the filename) because I tend to update / improve things.

Revision as of 15:40, 15 September 2014

High Voltage in Hall A

First of all, please be aware of the simple instructions for users:

https://hallaweb.jlab.org/wiki/index.php?title=HowTo_for_Users

Overview of Architecture

The HV crates sit in the hall, e.g. on the detector stacks. A set of cards with (usually) 12 channels of negative or positive HV are inserted into the crates. A custom "serial board" (built by Javier Gomez and Jack Segal) talks to the cards. This "serial board" replaces an old, obsolete motherboard. (There are still a few crates with this motherboard, however -- e.g. the beamline crate.) A Perl "Shim" server (written by Brad Sawatzky) runs on a PC nearby the HV crate. The "Shim" server uses (via s socket connection) a low-level C code written by Javier to talk to his serial card in the HV crate. On the User end, a Java GUI (written by Roman Pomatsalyuk) displays the HV information and provides User control. This Java GUI talks to the Shim server. Alternatively, the Java GUI can talk to the motherboard via a Portserver.

In the summer of 2014, the nearby PC where the Shim server runs is now an Raspberry PI (rPI) board, a small PC that sits in the crate. This work, done by Javier Gomez, Roman Pomatsalyuk, and Chuck Long, has made the server much faster and more stable.

The portserver/motherboard alternative is being phased out, but still exists in at least one place at the moment: the beamline HV crate. It may be in some older setups elsewhere, too.

Existing Crates

Here is a list of HV crates in Hall A as of Sept 2014

The crates that use portservers are talking to the old motherboard. The crates that use an rPI PC with Shim have had their motherboards removed. See Architecture above.

Location                Portserver, or PC for Shim            Config on ~/slowc     How to start on adev

Left HRS (1 crate)       rpi8                                     LEFT               cd ~adev/slowc ; ./hvs LEFT

Right HRS (bottom crate)    rpi7                                RIGHT 
 
Right HRS (top crate)       rpi4                                 RIGHT              cd ~adev/slowc ; ./hvs RIGHT  
                                                                                    (this starts both R-HRS crates)

Beamline                portserver hatsv4: 2003                  BEAMLINE           cd ~adev/slowc ; ./hvs BEAMLINE


Restarting the Servers

The servers can run on a PC with a serial connection. As of Sept 2014, all the HRS servers are running on Raspberry PI PCs.

A cron script ensures that the server is running pi@rpi1 ~ $ crontab -l

  1. cron jobs

5,10,15,20,25,30,35,40,45,50,53,55,58 * * * * /home/pi/scripts/start_hv_cron

The start_hv_cron script checks if the server is running. If it is running, the script does nothing. If it is not running, the script starts it. This means that the server should always be running (within about 5 min of restarting the rPI.) and you should not have to do anything.

However, I suppose it's possible the server needs to be killed and restarted (though I haven't seen it yet). Then you'd need to login as "pi" on one of the rPI boards and find the process and kill it. Or reboot the rPI and wait 5 minutes.



Setting Up A New Installation

If you install on an Intel PC in Hall A, note that these share a root partition, so they all see the software. Suppose, however, that you want to install on a new PC like a laptop. That's what these instructions are for.

1. Need Redhat 5 or greater in order to have proper glibc.

2. Needed Perl 5.8.8 or later install. On intel PC this appears as /perl after being put (on adaql1) in the shared area.

3. Set write permission for /dev/ttyS0 (or is your PC using /dev/ttyS2 ?) Typically the permissions gets reset when the PC is rebooted, such that users cannot write there. A wrong write permission causes a silent failure of the software.

4. Need telnet server because the network connection is via telnet

Install telnetd server by typing "yum install telnet-server" and allow telnet as follows: /etc/xinetd.d/telnet needs "disable no" and you need to restart xinetd /sbin/service xinetd restart

Note, if telnet mysteriously stops working ... Kill cfengine, which is Computer Center's rather rude security script that deletes /etc/xinetd.d/telnet. Yes, telnet is an old, insecure protocol, but it's needed by this software. The Java code uses telnet to communicate with the Perl server.

On ahut1 or ahut2, I have a cron script /root/scripts/prepHV which takes care of running the server automatically. It also periodically restores the "telnet" file mentioned above, and periodically restarts xinetd

Here is a simple test of the telnet server: If you are on, for example, ahut2 and can "telnet ahut2" (i.e. telnet into yourself). Then yes, the server is running.

5. Install the software. Assuming you don't need the Java code because it runs on an adaq machine in the counting room, there are two pieces: the Shim perl server and the low-level frontend C code, see "Architecture" above. You'll need to find the tar file for this, which is in the MSS.

For the rPI, the minimum installation kit is in /mss/home/rom/rpi_minimal_9sep14.tar but there are more complete forms of this which have Perl, etc. Ask me if you need that.

You should always pick the latest date (the date is usually written in the filename) because I tend to update / improve things.

What can go wrong ?

Common troublshooting items are listed in the User Guide (search for troubleshooting)

https://hallaweb.jlab.org/wiki/index.php?title=HV_HowTo_for_Users

Below is a list of other problems I've seen and their solution.

1. Power-cycling a HV crate has been known to help. Especially if more than one HV card does not respond. Then it's probably a crate problem not a card problem.

2. Cannot login to intelPC as root. Probably because you don't have /root, in which case the solution is to mount it by doing the following as superuser on adaql1: "cd /root ; ./mount_diskless". (but I think this has been fixed now)

3. No connection to server. Try restarting the server. Instructions in step 5.

4. If you try to run the Shim software by hand you might see "Can't connect to mainframe" and the Shim script silently dies ! This is a good one. The problem is that on IntelPCs the /shim directory where the software exists is not writeable, so you must run from /root/shim which is a local writeable disk area.

5. Restarting the servers.

This was a big problem for awhile, but I think it has largely gone away (see 8). The symptom is that the Java GUI loses connection and/or emits false alarms.

On ahut* the servers are run from the ahut account. If you need to restart:

[ahut@ahut2 ~]$ ps awx | grep -i shim

and "kill -9" the PID. Then it will either restart on it's own within a few minutes (there's a cron job), or you can restart by hand:

/home/ahut/scripts/startHV

On the intelPCs we do not (yet) have an ordinary account -- only root. The simplest way to restart the server is to reboot the intelPC. However, if you know the root password, you can follow a similar procedure as above to kill and restart the HV. The HV is started on intelPC with a command /shim/scripts/prepHV which runs as a cron job under root.

6. Another possibility is that the mainframe really cannot be connected to. First thing to try is to run, by hand, ./shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux* depending on the computer (* = _bob for ahut, = S0 on halladaq8, = S2 on intelha3). If that runs ok, the HV crate is on and is talking. If not, then check your cabling and power status.

7. On Feb 20, 2014 I found I could not raise HV on any of the 4 cards in a particular crate. Power cycling, pulling out cards, reseating them -- nothing helped. I tested the 4 cards in another crate and they worked. Some halogs about this. Finally, I put the cards back into the original crate but in different slots, and they worked again -- leading to much speculation about bad slots, bent pins, poor contact, possible temperature effects (meaning it will happen again ?).

8. On Feb 25, 2014 it was reported that cards 4, 6, and 9 on upper HV crate on R-HRS did not appear in the list, despite repeated reboot of the server. What I did to recover: a) save HV settings b) turn off HV in software c) turn off HV in hardware (two switches) d) turn on HV again. The cards came back. I admit this is not very satisfying, though. Later we found that the "fast" version of software runs fast but eventually causes cards to be disabled, such that only a power-cycle restores them. The "slow" version that runs on an IntelPC works reliably for many days, so we've reverted to that version on all platforms. Slow, but steady.