Difference between revisions of "HV technical"

Revision as of 15:17, 13 January 2014

R. Michaels, Nov 2013

Here is some extremely boring technical info about HV control. If you need to get some sleep, read on.

First, note these somewhat old URLs as a starting point

http://hallaweb.jlab.org/equipment/daq/HVhelp.html

http://hallaweb.jlab.org/equipment/daq/gen_slow_control.html

The above URLs will need to be modified and updated to provide User guidance. But they are a nice start.

A note about the HV software architecture:

1. "hvs" (aka "hvg") = Java GUI that provides User interface (author: Roman Pomatsalyuk)

2. LecroyHV_Shim = Perl server, listens on a telnet port, talks to hvs. This code emulates the obsolete (no-longer-existing) motherboards on the LeCroy 1458 HV crate (author: Brad Sawatzky). This server must run on the PC that is connected by RS232 to the HV crate.

3. i2lchv_linux_bob = Bob's version of Brad's version of Javier's code that does the low-level communication with the HV crate. (original author: Javier Gomez, but don't blame him for my hacks). i2lchv_linux_bob talks via R232 to the HV cards using a primitive "language". The LecroyHV_Shim Perl script connects to i2lchv_linux_bob via the "Expect" module of Perl.

Nov 12, 2013. Attempts to use Intel PC as the PC that is nearby the LeCroy HV crate.

from root on adaql1, see /root/diskless/i386/Centos5.8

./snapshot = directories local to the cpus. e.g. see /snapshot/intelha3 for intelha3.

./root = directory shared by all the cpus. However, not (yet) writeable by the cpus.

On the intel cpu, if you want to create or touch files, you must use the local filesystem.

/root/diskless/i386/Centos5.8/snapshot/intelha3/root (from adaql1)

appears as

/root (local filesystem on intelha3)

/shim (on intelha3) is the shared shim directory from

However, if you want to compile code that every intel PC uses, you must use

/root/diskless/i386/Centos5.8/root on adaql1

/shim (on intelha3) is the shared shim directory from /root/diskless/i386/Centos5.8/root/shim on adaql1

Nov 12, 2013. Attempting to run HV control on L-HRS.

Login to intelha3 as root.

Reminder
/root/shim  is the LOCAL copy.  Must run Shim server here because this filesystem can be written to.
/shim is the SHARED copy (on adaql1).  Must compile here (on adaql1)

To check that HV cards are seen via RS232, go to /root/shim and run /shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux_bob

I'm running code in /root/shim because if it needs to make output, this filesystem allows. But the code coms from /shim which is the shared area.

A note: if this code needs to be recompiled, you must do the compilation on the shared disk area and then use the exec in the local area.

If when you run i2lchv_linux_bob you see a list of HV cards, then it's a good sign.

Next, run the Perl Server: go to /root/shim/ and type /shim/LecroyHV_shim/LecroyHV_Shim (Note, I've attempted to automate this step, so it is running 24/7 even after rebooting the intel PC. see below)

On adaql1, as adev account, do this: "cd ./slowc ; ./hvs LEFT"

How to modify programming on intelha3 or other intel PCs

On adaql1 as root, in /root/diskless/i386/Centos5.8/root/shim/LecroyHV_shim/LecroyHV_FE and related areas. Modify code and compile here.

On intelha3 the files appear as /shim/LecroyHV_shim/LecroyHV_FE (etc)

Attempt to automate the startup of Shim server so that users don't need to care.

On intelha3, have a crontab entry for root

[root@intelha3 LecroyHV_shim]# crontab -l

# Start the shim server for HV
2,10,20,30,40,50 * * * * /shim/scripts/prepHV

The script prepHV checks if Shim is running. If it is, do nothing. If not, start it.

Notes about pre-requisites for making the Perl server work.

Needed Perl 5.8.8 or later install. On intel PC this appears as /perl after being put (on adaql1) in the shared area.

Needed RH5 or greater, to have correct glibc

/dev/ttyS0 permissions -- it gets reset when boot.

Install telnetd server, and restart allow telnet in /etc/xinetd.d

yum install telnet-server

Kill cfengine, which tends to rather rudely delete /etc/xinetd.d/telnet Yes, telnet is an old, insecure protocol, but it's needed by this software. The Java code uses telnet to communicate with the Perl server.

/etc/xinetd.d/telnet needs "disable no"

On ahut1 or ahut2, I have a cron script /root/scripts/prepHV which takes care of running the server automatically. It also periodically restores the "telnet" file mentioned above, and periodically restarts xinetd

Simple test of telnet server: If you are on, for example, ahut2 and can "telnet ahut2" (i.e. telnet into yourself). Then yes, the server is running.

Nov 13, 2013

Adding a new Intel VME PC to the hall

add to /etc/dhcpd.conf on adaql2

host halladaq8
              {
              filename "linux-install/pxelinux.0"; # File location relative to /tftpboot/
              next-server 129.57.164.32;            # TFTP server
#              hardware ethernet 00:20:38:04:6D:12; # Client MAC XVB601
#              hardware ethernet 00:20:CE:F6:03:EE; # Client MAC XVR14
#              hardware ethernet 00:21:70:AF:3B:46; # PC Magali LPC Clermont
              hardware ethernet 00:20:38:04:6D:0E; # Right Intel CPU HV control XVB601
              fixed-address 129.57.164.15;          # Client IP
              }
}

Then need to restart:

/etc/rc.d/init.d/dhcp restart

On adaql1

Need to add IP to adaql1:/root/tftpboot/linux-install/pxelinux.cfg

8139A40F is the hex version of 129.57.164.15 = halldaq8

What this file (8139A40F) looks like :

[root@adaql1 pxelinux.cfg]# pwd
/root/tftpboot/linux-install/pxelinux.cfg
[root@adaql1 pxelinux.cfg]# more 8139A40F
default Centos_5.8_32_bit

label Centos_5.8_32_bit
    kernel Centos_5.8_32_bit/vmlinuz
    append  initrd=Centos_5.8_32_bit/initrd.img root=/dev/ram0 init=disklessrc N
FSROOT=129.57.164.32:/diskless/i386/Centos5.8 ramdisk_size=23904 ETHERNET=eth0 S
NAPSHOT=intelha1 NISDOMAIN=CCHP vga=0x305 acpi=force vmalloc=256MB

Some things had to be enabled in BIOS, too. The network and the boot sequence -- Alex did it too fast.

Nov 14, 2013

Today I noticed that although intelha3 has /root from adaql1:/root/diskless/i386/Centos5.8/snapshot/intelha3/root/

halladaq8 does NOT have /root from adaql1:/root/diskless/i386/Centos5.8/snapshot/halladaq8/root/

Instead, halladaq8 has an independent /root which has nothing to do with adaql1. Wierd.

Jan 2, 2014

Working on clrlpc (DVCS setup) in ~/bobdev, I have made some tests to optimize the Shim server. The diffs are below. Here are some speed comparisons so far.

               Timing Tests for HV GUI

phase          time in seconds

             Portserver    Shim before mods    Shim after mods     

Initializing     17            100              25

Turn on HV       10            35               15

Enable and         5 + typing   30 + typing     5 + typing
set HV on 1 card

While this sounds nice, there is still a problem with the Shim software. There seems to be a probability "P" that a given command is ignored. "P" is small and hard to measure, but my impression is that before mods P = 0.05 and after the speedup mods, P = 0.1 or so. I have a few more ideas still to try.

Here are the diffs.

diff i2lchv_linux_bob.c~/slowc/ shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux_bob.c
17,19c17
< #define  SDELAY1    40000 /* checkACK delay (usec) */
< #define  SDELAY2    20000 /* pollDSR delay  (usec) */ 
< #define  SDELAY3    10000 /* main -- init loop over modules (usec) */
---
> #define  SDELAY     200000  /* in usecs (0.1 sec is stable for all but MC command (wants 0.2s), go figure) */
22c20
< #define  MAXPOLL    30    /* how many loops in pollDSR */
---
> #define  MAXPOLL    20
53c51
<   usleep(SDELAY1);
---
>   usleep(SDELAY);
99c97
<     usleep(SDELAY2);
---
>     usleep(SDELAY);
172c170
<     usleep(SDELAY3);  /* short wait for a response */
---
>     usleep(SDELAY);  /* short wait for a response */

and

dvcs@clrlpc.jlab.org> diff LecroyHV_Shim ~/slowc/shim/LecroyHV_shim/LecroyHV_Shim 
90c90
< my $PSUM_CACHE_TIME = 5;
---
> my $PSUM_CACHE_TIME = 10;
95c95
< my $GS_STALE_TIME = 25;
---
> my $GS_STALE_TIME = 15;

comments

1. The GS_STALE_TIME change seemed to alleviate the problem that the Shim got into a loop of GS being stale. Not sure if this is related to the probability "P" mentioned above.

2. I'm not sure if MAXPOLL should be adjusted. Making it very large causes "P" to increase, I think.

working ....

Update Jan 3, 2014

Brad emphasized that we should confirm that the DSR/busy line is being toggled in the "right" way -- never without a card being queried, does it go high after receiving a command that will generate a response ? or does it ever fail to go high ? Dpes it go low before the response is read out ?

I don't know how to check these things, but I assume: a) it could be checked with a scope; b) the DVCS hardware doesn't suffer from this since at the moment the crate works fine; c) not an issue with the crate if Jack did a good job; d) if a module is bad one should find it with low-level inquiries, then replace the module.

Brad also said that "updating stale GS" is nothing to worry about. Ok, good.

Latest set of changes is listed below. It is significantly faster. Note, reducing SDELAY1 appears to be risky.

Reliability issue: The probability "P" for a GUI operation to fail (i.e. simply not happen, requiring to try again) is zero for enabling / disabling a channel or setting a HV value. Good ! But might be an accident -- see below.

However, for turning on or off the crate, P was initially 50% today. I think I improved this by turning off some more printlog in the Shim, and I think the remaining small P (about 10%) is depending on print statements in the Java GUI, as I noticed that if you press HV ON or OFF when the GUI is printing, it may not do the transition. But if the GUI is not printing, the transition is reliable (P = 0). Need to look at that.

If you do succeed in turning on or off HV, it takes about 20 sec, which is 5x faster than before but still twice as long as the Portserver version.

dvcs@clrlpc.jlab.org> diff LecroyHV_Shim ~/slowc/shim/LecroyHV_shim/LecroyHV_Shim
86d85
< my $verbose = 0;   # = 1 print verbosely
91c90
< my $PSUM_CACHE_TIME = 5;
---
> my $PSUM_CACHE_TIME = 10;
96c95
< my $GS_STALE_TIME = 10;
---
> my $GS_STALE_TIME = 15;
247,249c246
<     if ( $verbose == 1 ) {
<        printlog ("[$id - $peer]: $message");
<     }
---
>     printlog ("[$id - $peer]: $message");
252,254c249
<     if ( $verbose == 1 ) {
<       printlog ("[R] '$response'");
<     }
---
>     printlog ("[R] '$response'");
476,478c471
<   if ( $verbose == 1 ) {
<     printlog("HV1458_handler: '$full_command'");
<   }
---
>   printlog("HV1458_handler: '$full_command'");
813,815c806
<     if ( $verbose == 1 ) {
<       printlog( "Updating stale GS.\n" );
<     }
---
>     printlog( "Updating stale GS.\n" );
843,845c834
<     if ( $verbose == 1 ) {
<       printlog( "Using cached psum($lslot)\n" );
<     }
---
>     printlog( "Using cached psum($lslot)\n" );
1058,1060d1046
< 
<   printlog("Response to poll_HV_status() = '$resp'");
< 
1111,1114c1097
<   if ( $verbose == 1 ) {
<     printlog "generic_cmd($cmd, $lslot)";
<   }
< 
---
>   printlog "generic_cmd($cmd, $lslot)";
1122,1124c1105
<   if ( $verbose == 1 ) {
<     printlog "remcmd: $remcmd";
<   }
---
>   printlog "remcmd: $remcmd";
1211,1213c1192
<     if ( $verbose == 1 ) {
<       printlog("prop: $prop\n");
<     }
---
>     printlog("prop: $prop\n");

and

diff i2lchv_linux_bob.c ~/slowc/shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux_bob.c
<  * bob's version with O_NONBLOCK and 3 values of SDELAY
<  *       to try to optimize the speed. 
<  * i2lchv_linux_bob.c
---
>  * bob's version with O_NONBLOCK 
> * i2lchv_linux_bob.c
18,20c17
< #define  SDELAY1    50000 /* checkACK delay (usec) <=20K causes failure*/
< #define  SDELAY2    5000 /* pollDSR delay  (usec) */ 
< #define  SDELAY3    10000 /* main -- init loop over modules (usec) */
---
> #define  SDELAY     200000  /* in usecs (0.1 sec is stable for all but MC command (wants 0.2s), go figure) */
23c20
< #define  MAXPOLL    50    /* how many loops in pollDSR */
---
> #define  MAXPOLL    20
54c51
<   usleep(SDELAY1);
---
>   usleep(SDELAY);
100c97
<     usleep(SDELAY2);
---
>     usleep(SDELAY);
173c170
<     usleep(SDELAY3);  /* short wait for a response */
---
>     usleep(SDELAY);  /* short wait for a response */

Jan 10, 2014

Disappointingly, deploying the above code in Hall A on HRS did not help. The logic of waiting by a fixed delay is wrong. Instead, we should check in a loop for "good data". Good data is defined as starting and ending with specific characters. I will try this next. The following e-mail from Javier Gomez explains.

Bob,

Not sure that I can explain what is going on but here is what I recall,

(1) if a module gets the ACK sequence:  geographic address, ACK (0x06), line-feed
it will attempt to dump the contents of its output buffer or just send the sequence: ACK, line-feed, carriage-return

(2) if a module gets a command, it responds
        (1st) ACK, line-feed, carriage-return  to acknowledge it got a command
        (2nd) starts preparing the command response - when it is ready, it raises the ATT* line which Linux has hard time handling as
interrupt
        (3rd) when the ATT* line goes up, the sequence: geographic address, ACK (0x06), line-feed needs to be sent for the module to
start transferring the data back
        (4th) the response , besides the ACK character it should also have a tag sent with the command so that it can be identified in
case that commands were concatenated.

(3) At the rate that the modules operate (38.4 kB), it takes 0.260 ms per character - a string of 40 characters, will take ~ 10.4 ms to
transfer.

Options to go around Linux not being interrupted easily,

(a) send a command and wait before proceeding - I think that in the very old test code I  used two waits: 10 ms & 100 ms (the equivalent
to SDELAY1)
        (1) 10 ms after sending a ACK sequence only (geographic address, ACK (0x06), line-feed)
        (2) 100 ms  guessing/waiting for the module to respond.
     this may have been reasonable for testing but if the module is late responding (e.g. busy dealing with what it thinks are changes
in the voltages due to noise)
then it may not answer in that window and things will get out of control because one is assuming the answer will be there. This is the
method used in the code you have.

b) another possibility somewhat more cumbersome which I did not try at that time is,
       (1) shortly after sending a command  (0.260 ms per character X number of characters), the addressed module should respond with
the ACK sequence (ACK\n\r) - 3 characters ~ 1 ms
        plus some time to check the command
        (2) you could then wait for some time, say 5 ms, and send the sequence: geographic address, ACK (0x06), line-feed  <== time for
module to get ready with answer
        (3) wait for say 10 ms (equivalent to ~ 40 characters to be transferred back from the module)
        (4) read the input buffer,
               (4.1) if the input buffer has the sequence (ACK\n\r), the response is not ready yet <== go back to (2) & you can skip the
5 ms because the sequence was received 10ms ago
               (4.2) if the buffer has something that starts with ACK followed by the unique tag you sent with the command, the response
has started. You then want to check
that the last two entries in the buffer are \n\r which indicate the module finished transferring the data. If they are not there, there
should be more data in the buffer or something got lost
       Depending on the system (e.g. assuming that linux really handles correctly the buffer swap) you should be able to reduce some of
the above time waits (e.g. get incomplete buffers that you can recognize because the sequence \n\r is not there). In principle, this
should be able to cope with modules having problems answering

Most of my tests were actually done not with Linux (the test code was from 2009) but with a realtime system so I was able to use
interrupts and the ATT* line. I did some tests
with a small board computer using Linux so that I could poll a bit to determine if the ATT* line has been raised  but I did not complete
them and I doubt I will get to it any time soon. One thing that I found in those tests however is that the default configuration of the
serial port seems to change with the Linux version & flavor. I do not think that this is your problem but here it is what I had to add
while setting the serial port attributes. You probably also want to make sure that the modem control lines are disabled to avoid any
noise, but this should already be true.

  rs232_attr.c_cc[VTIME]= 0; /* read - inter-character timer unused */
  rs232_attr.c_cc[VMIN] = 0; /* read - minimum number of characters */
 
Hope that it helps - Javier

Jan 13, 2014

I implemented the a new kind of check that replaces checkACK in the "loop forever" of Javier's C code. This loop is the ongoing dialog with the Perl Shim server. The new check is called bob1ACK. This code looks for a sequence "0x06 + [N char] + /n /r" where [N char] is N characters that may result from a reply of the card. In the simplest reply, N=0. This would be an acknowledgement (ACK) sequence. But for a reply, N can be larger, e.g. N=82. If a reply has not finished yet, you get no /n and /r, but only a truncated string. So, one must read again until the /n and /r are found, and one must concatenate the readings to form the reply. This is done now and seems to be fairly robust and fast.

A problem with both old and new code: sometimes if you do the operation (example: turn ON the crate) the command is ignored. The solution seems to be to change a Target value for the HV on one channel. This will "wake up" the code and it will do the operation (in this example: turn ON HV and also change the demand HV).

Here are some speed data.


This is for 10 HV cards in the DVCS HV crate.  Times in seconds.
Old code with delay = 200K, as used on HRS now.  New code with bob1ACK

                   Old          New

start the GUI:      70           20

turn on/off HV      40           10

change a HV value   20           10
or enable/disable

It looks faster now.

@@ Line 461: / Line 461: @@
 <font color="blue">A problem with both old and new code:  sometimes if you do the operation (example: turn ON the crate) the command is ignored.  The solution seems to be to change a Target value for the HV on one channel.  This will "wake up" the code and it will do the operation (in this example: turn ON HV and also change the demand HV).</font>
-Here are some speed trials
+Here are some speed data.
 <pre>

Difference between revisions of "HV technical"

Revision as of 15:17, 13 January 2014

Navigation menu

Views

Personal tools

Navigation

Search

Tools