Difference between revisions of "HV technical"

From Hall A Wiki
Jump to: navigation, search
 
(48 intermediate revisions by the same user not shown)
Line 1: Line 1:
Here is some boring technical info about HV control.  Note these somewhat old URLs as a starting point
+
R. Michaels,  Nov 2013
 +
 
 +
Here is some extremely boring technical info about HV control.  Some of it is probably obsolete by the time you see it.  If you need to get some sleep, read on.
 +
 
 +
First, note these somewhat old URLs as a starting point
  
 
http://hallaweb.jlab.org/equipment/daq/HVhelp.html
 
http://hallaweb.jlab.org/equipment/daq/HVhelp.html
Line 5: Line 9:
 
http://hallaweb.jlab.org/equipment/daq/gen_slow_control.html
 
http://hallaweb.jlab.org/equipment/daq/gen_slow_control.html
  
The above URLs will need to be modified / updated to provide User guidance.
+
The above URLs will need to be modified and updated to provide User guidance.  But they are a nice start.
  
A note about the architecture:
+
A note about the HV software architecture:
  
 
1. "hvs" (aka "hvg") = Java GUI that provides User interface (author: Roman Pomatsalyuk)
 
1. "hvs" (aka "hvg") = Java GUI that provides User interface (author: Roman Pomatsalyuk)
Line 37: Line 41:
  
 
/root (local filesystem on intelha3)
 
/root (local filesystem on intelha3)
 +
 +
/shim (on intelha3) is the shared shim directory from
  
 
However, if you want to compile code that every intel PC uses, you must use  
 
However, if you want to compile code that every intel PC uses, you must use  
  
 
/root/diskless/i386/Centos5.8/root on adaql1
 
/root/diskless/i386/Centos5.8/root on adaql1
 +
 +
/shim (on intelha3) is the shared shim directory from /root/diskless/i386/Centos5.8/root/shim on adaql1
 +
 +
-------------------------------------
 +
 +
Nov 12, 2013.  Attempting to run HV control on L-HRS.
 +
 +
Login to intelha3 as root.
 +
 +
<pre>
 +
Reminder
 +
/root/shim  is the LOCAL copy.  Must run Shim server here because this filesystem can be written to.
 +
/shim is the SHARED copy (on adaql1).  Must compile here (on adaql1)
 +
</pre>
 +
 +
To check that HV cards are seen via RS232, go to /root/shim  and run /shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux_bob
 +
 +
I'm running code in /root/shim because if it needs to make output, this filesystem allows.  But the code coms from /shim which is the shared area.
 +
 +
A note:  if this code needs to be recompiled, you must do the compilation on the shared disk area and then use the exec in the local area.
 +
 +
If when you run i2lchv_linux_bob you see a list of HV cards, then it's a good sign.
 +
 +
Next, run the Perl Server:  go to /root/shim/ and type /shim/LecroyHV_shim/LecroyHV_Shim  (Note, I've attempted to automate this step, so it is running 24/7 even after rebooting the intel PC.  see below)
 +
 +
On adaql1, as adev account, do this:  "cd ./slowc ; ./hvs LEFT"
 +
 +
How to modify programming on intelha3 or other intel PCs
 +
 +
On adaql1 as root, in /root/diskless/i386/Centos5.8/root/shim/LecroyHV_shim/LecroyHV_FE and related areas. Modify code and compile here.
 +
 +
On intelha3 the files appear as /shim/LecroyHV_shim/LecroyHV_FE  (etc)
 +
 +
Attempt to automate the startup of Shim server so that users don't need to care.
 +
 +
On intelha3, have a crontab entry for root
 +
 +
[root@intelha3 LecroyHV_shim]# crontab -l
 +
<pre>
 +
# Start the shim server for HV
 +
2,10,20,30,40,50 * * * * /shim/scripts/prepHV
 +
</pre>
 +
 +
The script prepHV checks if Shim is running.  If it is, do nothing.  If not, start it.
 +
 +
 +
-----------------------------------------
 +
 +
<font color="red">Notes about pre-requisites for making the Perl server work.</font>
 +
 +
Needed Perl 5.8.8 or later install.  On intel PC this appears as /perl after being put (on adaql1) in the shared area.
 +
 +
Needed RH5 or greater, to have correct glibc
 +
 +
/dev/ttyS0  permissions  -- it gets reset when boot.
 +
 +
Install telnetd server, and restart allow telnet in /etc/xinetd.d
 +
 +
yum install telnet-server
 +
 +
Kill cfengine, which tends to rather rudely delete /etc/xinetd.d/telnet  Yes, telnet is an old, insecure protocol, but it's needed by this software.  The Java code uses telnet to communicate with the Perl server.
 +
 +
/etc/xinetd.d/telnet  needs "disable no"
 +
 +
On ahut1 or ahut2, I have a cron script /root/scripts/prepHV which takes care of running the server automatically.
 +
It also periodically restores the "telnet" file mentioned above, and periodically restarts xinetd
 +
 +
Simple test of telnet server:  If you are on, for example, ahut2 and can "telnet ahut2" (i.e. telnet into yourself).  Then yes, the server is running. 
 +
 +
NOTE on getting Perl to work on a standard RHEL 6  i385 CUE level 2 install.
 +
 +
I had to install these modules to get Brad's shim server to work
 +
 +
rpm -ivf perl-IO-Tty-1.08-3.el6.i686.rpm
 +
 +
rpm -ivf perl-IO-Multiplex-1.13-1.el6.rf.noarch.rpm
 +
 +
rpm -ivf perl-Net-Server-0.97-7.el6.noarch.rpm
 +
 +
I think that's all I had to do to the default Perl installation on RHEL6, but there was more work
 +
needed to get the C code to compile
 +
 +
<pre>
 +
This is probably provided by some "devel" package, but whatever ...
 +
 +
root@hvsrv1 include]# pwd
 +
/usr/include
 +
[root@hvsrv1 include]# mkdir readline
 +
[root@hvsrv1 include]# cd readline
 +
[root@hvsrv1 readline]# scp highv@hvsrv2:/usr/include/readline/all.tar .
 +
highv@hvsrv2's password:
 +
all.tar                                                                          100%  70KB  70.0KB/s  00:00   
 +
[root@hvsrv1 readline]# tar xvf all.tar
 +
chardefs.h
 +
history.h
 +
keymaps.h
 +
readline.h
 +
rlconf.h
 +
rlstdc.h
 +
rltypedefs.h
 +
tilde.h
 +
</pre>
 +
 +
Also:
 +
 +
<pre>
 +
yum install ncurses-devel-5.7-3.20090208.el6.i686
 +
 +
Also
 +
[root@hvsrv1 lib]# pwd
 +
/lib
 +
[root@hvsrv1 lib]# ln -s libreadline.so.6.0 libreadline.so
 +
</pre>
 +
 +
-----------------------------------------------
 +
 +
Nov 13, 2013
 +
 +
Adding a new Intel VME PC to the hall
 +
 +
add to /etc/dhcpd.conf on adaql2
 +
 +
<pre>
 +
host halladaq8
 +
              {
 +
              filename "linux-install/pxelinux.0"; # File location relative to /tftpboot/
 +
              next-server 129.57.164.32;            # TFTP server
 +
#              hardware ethernet 00:20:38:04:6D:12; # Client MAC XVB601
 +
#              hardware ethernet 00:20:CE:F6:03:EE; # Client MAC XVR14
 +
#              hardware ethernet 00:21:70:AF:3B:46; # PC Magali LPC Clermont
 +
              hardware ethernet 00:20:38:04:6D:0E; # Right Intel CPU HV control XVB601
 +
              fixed-address 129.57.164.15;          # Client IP
 +
              }
 +
}
 +
</pre>
 +
 +
Then need to restart:
 +
 +
/etc/rc.d/init.d/dhcp restart
 +
 +
On adaql1
 +
 +
Need to add IP to adaql1:/root/tftpboot/linux-install/pxelinux.cfg
 +
 +
8139A40F is the hex version of 129.57.164.15 = halldaq8
 +
 +
What this file (8139A40F) looks like :
 +
 +
<pre>
 +
[root@adaql1 pxelinux.cfg]# pwd
 +
/root/tftpboot/linux-install/pxelinux.cfg
 +
[root@adaql1 pxelinux.cfg]# more 8139A40F
 +
default Centos_5.8_32_bit
 +
 +
label Centos_5.8_32_bit
 +
    kernel Centos_5.8_32_bit/vmlinuz
 +
    append  initrd=Centos_5.8_32_bit/initrd.img root=/dev/ram0 init=disklessrc N
 +
FSROOT=129.57.164.32:/diskless/i386/Centos5.8 ramdisk_size=23904 ETHERNET=eth0 S
 +
NAPSHOT=intelha1 NISDOMAIN=CCHP vga=0x305 acpi=force vmalloc=256MB
 +
</pre>
 +
 +
Some things had to be enabled in BIOS, too.  The network and the boot sequence -- Alex did it too fast.
 +
 +
---------------------------------------------------------------------
 +
 +
Nov 14, 2013
 +
 +
Today I noticed that although intelha3 has /root from
 +
adaql1:/root/diskless/i386/Centos5.8/snapshot/intelha3/root/
 +
 +
halladaq8 does NOT have /root from
 +
adaql1:/root/diskless/i386/Centos5.8/snapshot/halladaq8/root/
 +
 +
Instead, halladaq8 has an independent /root which has nothing to do with adaql1.
 +
Wierd.
 +
 +
----------------------------------------------------------------
 +
 +
Jan 2, 2014
 +
 +
Working on clrlpc (DVCS setup) in ~/bobdev, I have made some tests to optimize the Shim server.
 +
The diffs are below.  Here are some speed comparisons so far.
 +
 +
<pre>
 +
              Timing Tests for HV GUI
 +
 +
phase          time in seconds
 +
 +
            Portserver    Shim before mods    Shim after mods   
 +
 +
Initializing    17            100              25
 +
 +
Turn on HV      10            35              15
 +
 +
Enable and        5 + typing  30 + typing    5 + typing
 +
set HV on 1 card
 +
 +
</pre>
 +
 +
While this sounds nice, there is still a problem with the Shim software.  There seems to be
 +
a probability "P" that a given command is ignored.  "P" is small and hard to measure, but my
 +
impression is that before mods P = 0.05 and after the speedup mods, P = 0.1 or so.  I have a
 +
few more ideas still to try.
 +
 +
Here are the diffs.
 +
 +
<pre>
 +
diff i2lchv_linux_bob.c~/slowc/ shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux_bob.c
 +
17,19c17
 +
< #define  SDELAY1    40000 /* checkACK delay (usec) */
 +
< #define  SDELAY2    20000 /* pollDSR delay  (usec) */
 +
< #define  SDELAY3    10000 /* main -- init loop over modules (usec) */
 +
---
 +
> #define  SDELAY    200000  /* in usecs (0.1 sec is stable for all but MC command (wants 0.2s), go figure) */
 +
22c20
 +
< #define  MAXPOLL    30    /* how many loops in pollDSR */
 +
---
 +
> #define  MAXPOLL    20
 +
53c51
 +
<  usleep(SDELAY1);
 +
---
 +
>  usleep(SDELAY);
 +
99c97
 +
<    usleep(SDELAY2);
 +
---
 +
>    usleep(SDELAY);
 +
172c170
 +
<    usleep(SDELAY3);  /* short wait for a response */
 +
---
 +
>    usleep(SDELAY);  /* short wait for a response */
 +
</pre>
 +
 +
and
 +
 +
<pre>
 +
dvcs@clrlpc.jlab.org> diff LecroyHV_Shim ~/slowc/shim/LecroyHV_shim/LecroyHV_Shim
 +
90c90
 +
< my $PSUM_CACHE_TIME = 5;
 +
---
 +
> my $PSUM_CACHE_TIME = 10;
 +
95c95
 +
< my $GS_STALE_TIME = 25;
 +
---
 +
> my $GS_STALE_TIME = 15;
 +
</pre>
 +
 +
comments
 +
 +
1. The GS_STALE_TIME change seemed to alleviate the problem that the Shim got into a loop
 +
of GS being stale.  Not sure if this is related to the probability "P" mentioned above.
 +
 +
2. I'm not sure if MAXPOLL should be adjusted.  Making it very large causes "P" to increase,
 +
I think.
 +
 +
working ....
 +
 +
------------------------------------------------------------
 +
 +
Update Jan 3, 2014
 +
 +
Brad emphasized that we should confirm that the DSR/busy line is being toggled in the "right" way -- never without a card being queried, does it go high after receiving a command that will generate a response ?  or does it ever fail to go high ?  Dpes it go low before the response is read out ? 
 +
 +
I don't know how to check these things, but I assume: a) it could be checked with a scope; b) the DVCS hardware doesn't suffer from this since at the moment the crate works fine; c) not an issue with the crate if Jack did a good job; d) if a module is bad one should find it with low-level inquiries, then replace the module.
 +
 +
Brad also said that "updating stale GS" is nothing to worry about.  Ok, good.
 +
 +
Latest set of changes is listed below.  It is significantly faster. Note, reducing SDELAY1 appears to be risky.
 +
 +
Reliability issue:  The probability "P" for a GUI operation to fail (i.e. simply not happen, requiring to try again)
 +
is zero for enabling / disabling a channel or setting a HV value.  Good !  But might be an accident -- see below.
 +
 +
However, for turning on or off the crate, P was initially 50% today.  I think I improved this by turning off some
 +
more printlog in the Shim, and I think the remaining small P (about 10%) is depending on print statements in the Java GUI,
 +
as I noticed that if you press HV ON or OFF when the GUI is printing, it may not do the transition.  But if the GUI is
 +
not printing, the transition is reliable (P = 0).  Need to look at that.
 +
 +
If you do succeed in turning on or off HV, it takes about 20 sec, which is 5x faster than before but still twice as
 +
long as the Portserver version.
 +
 +
<pre>
 +
dvcs@clrlpc.jlab.org> diff LecroyHV_Shim ~/slowc/shim/LecroyHV_shim/LecroyHV_Shim
 +
86d85
 +
< my $verbose = 0;  # = 1 print verbosely
 +
91c90
 +
< my $PSUM_CACHE_TIME = 5;
 +
---
 +
> my $PSUM_CACHE_TIME = 10;
 +
96c95
 +
< my $GS_STALE_TIME = 10;
 +
---
 +
> my $GS_STALE_TIME = 15;
 +
247,249c246
 +
<    if ( $verbose == 1 ) {
 +
<        printlog ("[$id - $peer]: $message");
 +
<    }
 +
---
 +
>    printlog ("[$id - $peer]: $message");
 +
252,254c249
 +
<    if ( $verbose == 1 ) {
 +
<      printlog ("[R] '$response'");
 +
<    }
 +
---
 +
>    printlog ("[R] '$response'");
 +
476,478c471
 +
<  if ( $verbose == 1 ) {
 +
<    printlog("HV1458_handler: '$full_command'");
 +
<  }
 +
---
 +
>  printlog("HV1458_handler: '$full_command'");
 +
813,815c806
 +
<    if ( $verbose == 1 ) {
 +
<      printlog( "Updating stale GS.\n" );
 +
<    }
 +
---
 +
>    printlog( "Updating stale GS.\n" );
 +
843,845c834
 +
<    if ( $verbose == 1 ) {
 +
<      printlog( "Using cached psum($lslot)\n" );
 +
<    }
 +
---
 +
>    printlog( "Using cached psum($lslot)\n" );
 +
1058,1060d1046
 +
<
 +
<  printlog("Response to poll_HV_status() = '$resp'");
 +
<
 +
1111,1114c1097
 +
<  if ( $verbose == 1 ) {
 +
<    printlog "generic_cmd($cmd, $lslot)";
 +
<  }
 +
<
 +
---
 +
>  printlog "generic_cmd($cmd, $lslot)";
 +
1122,1124c1105
 +
<  if ( $verbose == 1 ) {
 +
<    printlog "remcmd: $remcmd";
 +
<  }
 +
---
 +
>  printlog "remcmd: $remcmd";
 +
1211,1213c1192
 +
<    if ( $verbose == 1 ) {
 +
<      printlog("prop: $prop\n");
 +
<    }
 +
---
 +
>    printlog("prop: $prop\n");
 +
</pre>
 +
 +
and
 +
 +
<pre>
 +
diff i2lchv_linux_bob.c ~/slowc/shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux_bob.c
 +
<  * bob's version with O_NONBLOCK and 3 values of SDELAY
 +
<  *      to try to optimize the speed.
 +
<  * i2lchv_linux_bob.c
 +
---
 +
>  * bob's version with O_NONBLOCK
 +
> * i2lchv_linux_bob.c
 +
18,20c17
 +
< #define  SDELAY1    50000 /* checkACK delay (usec) <=20K causes failure*/
 +
< #define  SDELAY2    5000 /* pollDSR delay  (usec) */
 +
< #define  SDELAY3    10000 /* main -- init loop over modules (usec) */
 +
---
 +
> #define  SDELAY    200000  /* in usecs (0.1 sec is stable for all but MC command (wants 0.2s), go figure) */
 +
23c20
 +
< #define  MAXPOLL    50    /* how many loops in pollDSR */
 +
---
 +
> #define  MAXPOLL    20
 +
54c51
 +
<  usleep(SDELAY1);
 +
---
 +
>  usleep(SDELAY);
 +
100c97
 +
<    usleep(SDELAY2);
 +
---
 +
>    usleep(SDELAY);
 +
173c170
 +
<    usleep(SDELAY3);  /* short wait for a response */
 +
---
 +
>    usleep(SDELAY);  /* short wait for a response */
 +
</pre>
 +
 +
-----------------------------------------------------------------------
 +
 +
Jan 10, 2014
 +
 +
Disappointingly, deploying the above code in Hall A on HRS did not help.  The logic of waiting by a fixed delay is wrong.  Instead, we should check in a loop for "good data".  Good data is defined as starting and ending with specific characters.  I will try this next.  The following e-mail from Javier Gomez explains.
 +
 +
<pre>
 +
Bob,
 +
 +
Not sure that I can explain what is going on but here is what I recall,
 +
 +
(1) if a module gets the ACK sequence:  geographic address, ACK (0x06), line-feed
 +
it will attempt to dump the contents of its output buffer or just send the sequence: ACK, line-feed, carriage-return
 +
 +
(2) if a module gets a command, it responds
 +
        (1st) ACK, line-feed, carriage-return  to acknowledge it got a command
 +
        (2nd) starts preparing the command response - when it is ready, it raises the ATT* line which Linux has hard time handling as
 +
interrupt
 +
        (3rd) when the ATT* line goes up, the sequence: geographic address, ACK (0x06), line-feed needs to be sent for the module to
 +
start transferring the data back
 +
        (4th) the response , besides the ACK character it should also have a tag sent with the command so that it can be identified in
 +
case that commands were concatenated.
 +
 +
(3) At the rate that the modules operate (38.4 kB), it takes 0.260 ms per character - a string of 40 characters, will take ~ 10.4 ms to
 +
transfer.
 +
 +
Options to go around Linux not being interrupted easily,
 +
 +
(a) send a command and wait before proceeding - I think that in the very old test code I  used two waits: 10 ms & 100 ms (the equivalent
 +
to SDELAY1)
 +
        (1) 10 ms after sending a ACK sequence only (geographic address, ACK (0x06), line-feed)
 +
        (2) 100 ms  guessing/waiting for the module to respond.
 +
    this may have been reasonable for testing but if the module is late responding (e.g. busy dealing with what it thinks are changes
 +
in the voltages due to noise)
 +
then it may not answer in that window and things will get out of control because one is assuming the answer will be there. This is the
 +
method used in the code you have.
 +
 +
b) another possibility somewhat more cumbersome which I did not try at that time is,
 +
      (1) shortly after sending a command  (0.260 ms per character X number of characters), the addressed module should respond with
 +
the ACK sequence (ACK\n\r) - 3 characters ~ 1 ms
 +
        plus some time to check the command
 +
        (2) you could then wait for some time, say 5 ms, and send the sequence: geographic address, ACK (0x06), line-feed  <== time for
 +
module to get ready with answer
 +
        (3) wait for say 10 ms (equivalent to ~ 40 characters to be transferred back from the module)
 +
        (4) read the input buffer,
 +
              (4.1) if the input buffer has the sequence (ACK\n\r), the response is not ready yet <== go back to (2) & you can skip the
 +
5 ms because the sequence was received 10ms ago
 +
              (4.2) if the buffer has something that starts with ACK followed by the unique tag you sent with the command, the response
 +
has started. You then want to check
 +
that the last two entries in the buffer are \n\r which indicate the module finished transferring the data. If they are not there, there
 +
should be more data in the buffer or something got lost
 +
      Depending on the system (e.g. assuming that linux really handles correctly the buffer swap) you should be able to reduce some of
 +
the above time waits (e.g. get incomplete buffers that you can recognize because the sequence \n\r is not there). In principle, this
 +
should be able to cope with modules having problems answering
 +
 +
Most of my tests were actually done not with Linux (the test code was from 2009) but with a realtime system so I was able to use
 +
interrupts and the ATT* line. I did some tests
 +
with a small board computer using Linux so that I could poll a bit to determine if the ATT* line has been raised  but I did not complete
 +
them and I doubt I will get to it any time soon. One thing that I found in those tests however is that the default configuration of the
 +
serial port seems to change with the Linux version & flavor. I do not think that this is your problem but here it is what I had to add
 +
while setting the serial port attributes. You probably also want to make sure that the modem control lines are disabled to avoid any
 +
noise, but this should already be true.
 +
 +
  rs232_attr.c_cc[VTIME]= 0; /* read - inter-character timer unused */
 +
  rs232_attr.c_cc[VMIN] = 0; /* read - minimum number of characters */
 +
 +
Hope that it helps - Javier
 +
</pre>
 +
 +
 +
----------------------------------------------------------------------------------------------------------------------------------
 +
 +
Jan 13, 2014
 +
 +
I implemented the a new kind of check that replaces checkACK in the "loop forever" of Javier's C code.  This loop is the ongoing dialog with the Perl Shim server.  The new check is called bob1ACK.  This code looks for a sequence  "0x06 + [N char] + /n /r" where [N char] is N characters that may result from a reply of the card. In the simplest reply, N=0.  This would be an acknowledgement (ACK) sequence.  But for a reply, N can be larger, e.g. N=82.  If a reply has not finished yet, you get no /n and /r, but only a truncated string.  So, one must read again until the /n and /r are found, and one must concatenate the readings to form the reply.  This is done now and seems to be fairly robust and fast.
 +
 +
<font color="blue">A problem with both old and new code:  sometimes if you do the operation (example: turn ON the crate) the command is ignored.  The solution seems to be to change a Target value for the HV on one channel.  This will "wake up" the code and it will do the operation (in this example: turn ON HV and also change the demand HV).</font> 
 +
 +
I wonder if this is due to the latencies built into the Shim server ?  Indeed, playing with the $PSUM_CACHE_TIME and  $GS_STALE_TIME  seems to affect this.  I think that "1" and "10" are ok. 
 +
 +
I also think that GS_STALE_TIME should exceed PSUM_CACHE_TIME, or else the data may always be stale.
 +
 +
There is another cause for an operation to fail, but it looks like the Java GUI. You press "HV ON" and quickly "HV OFF", it will not change state. Most users won't do that, so I think it's no big deal.
 +
 +
 +
Here are some speed data.
 +
 +
<pre>
 +
 +
This is for 10 HV cards in the DVCS HV crate.  Times in seconds.
 +
Old code with delay = 200K, as used on HRS now.  New code with bob1ACK
 +
 +
                  Old          New
 +
 +
start the GUI:      70          20
 +
 +
turn on/off HV      40          10
 +
 +
change a HV value  20          10
 +
or enable/disable
 +
 +
It looks faster now.
 +
</pre>
 +
 +
---------------------------------------------------------------------------------
 +
 +
Update Jan 17, 2014
 +
 +
In the past 2 days, I've been trying the new software on the R-HRS HV system.  Remarkably, it did not work on the Intel PC halladaq8, but it did work on the ahut1 PC.  I did not try yet on the Intel PC intelha3 because it's on the L-HRS and I don't want to interfere with detector checkout.
 +
 +
Investigating the failure, I found by using low-level queries of the HV crate that : 1) It always acknowledged a command within 10 msec (normal and good); and (2) it often did not give a data string within a reasonable amount of time, even 1 second; and (3) if I asked multiple times for data (using a write command), the cards would eventually give data which looked valid (changed with demand HV, etc).
 +
 +
Thus, as a "band-aid" solution, I made the query for data be a loop which tried (until a timeout) to get data by repeatedly writing the request and reading back.  This worked, but with another twist:  The sleep time needs to be reasonably big, like 0.2 msec, or you don't get data, no matter how often you do a write/read sequence or if you try to read repeatedly with small delays in between readings.  So, there must be a
 +
sufficiently big delay between write and read.
 +
 +
And here's the kicker:  This delay can be much smaller on ahut1 (and apparently on clrlpc, the DVCS PC).  What's the difference between these computers ?  Are Intel PCs "slow" somehow ?
 +
 +
Next, I made the delay depend on the index of the loop.  Early in the loop, the code tries to be fast, and later in the loop it goes slower.  The result is that code works on all PCs now, but is fast on ahut1 and clrlpc, but slow on halladaq8 (an XV8601 Intel VME-based PC).  This has the ironic consequence of making the code even slower than before on halladaq8 since the first loop is wasted.  Hmmm.....
 +
 +
Latest version of the frontend C code is i2lchv_linux_dev.c
 +
 +
I'll put a copy in /mss/home/rom/i2lchv_linux_dev_17jan13.c
 +
 +
I am pondering this mystery ....
 +
 +
------------------------------------------------------------------------------------------
 +
 +
Jan 17, 2014 (cont)
 +
 +
Three other notes, while I remember them.
 +
 +
1. The version on clrlpc (looks for data structure and is fast) also works on ahut1.
 +
 +
2. The "band-aid" version mentioned above (loops with successive write/read) works as a standalone C code (rs232 dialog) but NOT with the Java GUI -- it gives truncated strings which leads to an exception.  This happens for servers on both hallada8 and ahut1.
 +
 +
3. The initialization phase on the top crate, which has 15 cards, can lose a card.  Some cards are more easily lost than others, in particular the ones in slots 2, 4, and 7.  However, if I make the sleep time 0.2 msec during initialization, all cards are seen.  I saw no failure (lost card) in 25+ sessions.  This is an adequate solution since the initialization is done once, and long before the Java GUI tries to connect.  But it tells me something.
 +
 +
-------------------------------------------------------------------------------------------
 +
 +
Jan 17, 2014 (3rd entry)
 +
 +
Version status:
 +
 +
On ahut1
 +
 +
i2lchv_linux_save17jan14.c = what we've used for the past few months
 +
 +
i2lchv_linux_slow.c = pretty much the same; this version was on Intel.  (we'll need a unified version, obviously)
 +
 +
i2lchv_linux_dev_17jan14.c = "band-aid" development.  Has a bob2ACK in addition to bob1ACK.  This might be a place to start for further work.
 +
 +
i2lchv_linux_clrplc_16jan14.c = faster version developed on DVCS test stand.  It also works on ahut1.  I will use this for now.
 +
 +
i2lchv_linux_bob.c = working version at any given time.  Compile with "build". 
 +
 +
On halladaq8
 +
 +
i2lchv_linux_S0.c = working version for R-HRS halladaq8
 +
 +
i2lchv_linux_S2.c = working version for L-HRS intelha3 (DO NOT TOUCH, for now)
 +
 +
i2lchv_linux_slow.c = slow version which works on Intel PC.  Same as i2lchv_linux_S2.c modulo /dev/tty*
 +
 +
i2lchv_linux_dev.c = "band-aid" development.  See above.  Has bob2ACK, etc.
 +
 +
Whew !
 +
 +
--------------------------------------------------------------------------------------
 +
 +
Jan 21, 2014
 +
 +
And another version
 +
 +
i2lchv_linux_dev_18jan14.c
 +
 +
but this and the other "dev" version didn't work well on ahut1.  I think because some commands like LD only receive an ACK answer, and bob2 is looking for a longer string.  The bob1 code is better and the "golden" version is
 +
 +
i2lchv_linux_clrplc_16jan14.c
 +
 +
We'll use this on laptops.
 +
 +
Another important issue:  Brad Sawatzky tells me that the USB-to-Serial connections used on the Intel PCs are slow and buggy.  It is much better to have a PCI-to-Serial card.  I will look into getting those, but in the meantime I will deploy the laptops for the two HRS HV crates, and use Intel PCs as a backup only.
 +
 +
--------------------------------------------------------------------------------------
 +
 +
April 14, 2014
 +
 +
Lesson from first few months.  A slower version of the il2chv_linux_* code is best for long-term stability.  The software is in MSS in
 +
 +
/mss/home/rom/hv_ahut2_14apr14.tar
 +
 +
On RHEL6, I had to run some RPMs to get the Perl Shim server to work; explained above
 +
in the relevant section.
 +
 +
--------------------------------------------------------------------------------------
 +
 +
Adding PCI serial cards.
 +
 +
The Multi I/O card (2 port) that Brad gave me works.  I had to get the driver from www.drivers-download.com and search for DL-0385102 and get the latest one.  This was put into
 +
 +
hvsrv2:/home/highv/sdriver
 +
 +
then
 +
<pre>
 +
[root@hvsrv2 Linux_v1.2]# pwd
 +
/home/highv/sdriver/Linux_v1.2
 +
 +
[root@hvsrv2 Linux_v1.2]# ./sysbas_mpdrv.v12.sh
 +
             
 +
================================================================
 +
SystemBase Multiport PCI/PCIe Board Installation
 +
Version : 4.1 revision: 2010-01-29
 +
contact: tech@sysbas.com
 +
================================================================
 +
1 board(s) installed
 +
Board No.1 : Multi-2 PCI (rev b0)
 +
/dev/ttyMP0 (RS232 , 16C105X)
 +
/dev/ttyMP1 (RS232 , 16C105X)
 +
 +
and these two devices worked as alternative serial ports
 +
</pre>
 +
 +
--------------------------------------------------
 +
 +
Raspberry PI and all that -- Apr 30, 2014
 +
 +
Brad says we should be polling an ATTN line which Jack put to flow control (DSR, I think).  Normal serial cables have TX, RX, and gnd.  This is a 4th line which must be run. 
 +
 +
The Raspberry PI should allow this too, but if we can already do it with PCs we should.
 +
 +
Apparently the beamline crate had this setup and was pretty fast.

Latest revision as of 16:43, 15 September 2014

R. Michaels, Nov 2013

Here is some extremely boring technical info about HV control. Some of it is probably obsolete by the time you see it. If you need to get some sleep, read on.

First, note these somewhat old URLs as a starting point

http://hallaweb.jlab.org/equipment/daq/HVhelp.html

http://hallaweb.jlab.org/equipment/daq/gen_slow_control.html

The above URLs will need to be modified and updated to provide User guidance. But they are a nice start.

A note about the HV software architecture:

1. "hvs" (aka "hvg") = Java GUI that provides User interface (author: Roman Pomatsalyuk)

2. LecroyHV_Shim = Perl server, listens on a telnet port, talks to hvs. This code emulates the obsolete (no-longer-existing) motherboards on the LeCroy 1458 HV crate (author: Brad Sawatzky). This server must run on the PC that is connected by RS232 to the HV crate.

3. i2lchv_linux_bob = Bob's version of Brad's version of Javier's code that does the low-level communication with the HV crate. (original author: Javier Gomez, but don't blame him for my hacks). i2lchv_linux_bob talks via R232 to the HV cards using a primitive "language". The LecroyHV_Shim Perl script connects to i2lchv_linux_bob via the "Expect" module of Perl.


Nov 12, 2013. Attempts to use Intel PC as the PC that is nearby the LeCroy HV crate.

from root on adaql1, see /root/diskless/i386/Centos5.8

./snapshot = directories local to the cpus. e.g. see /snapshot/intelha3 for intelha3.

./root = directory shared by all the cpus. However, not (yet) writeable by the cpus.

On the intel cpu, if you want to create or touch files, you must use the local filesystem.

/root/diskless/i386/Centos5.8/snapshot/intelha3/root (from adaql1)

appears as

/root (local filesystem on intelha3)

/shim (on intelha3) is the shared shim directory from

However, if you want to compile code that every intel PC uses, you must use

/root/diskless/i386/Centos5.8/root on adaql1

/shim (on intelha3) is the shared shim directory from /root/diskless/i386/Centos5.8/root/shim on adaql1


Nov 12, 2013. Attempting to run HV control on L-HRS.

Login to intelha3 as root.

Reminder
/root/shim  is the LOCAL copy.  Must run Shim server here because this filesystem can be written to.
/shim is the SHARED copy (on adaql1).  Must compile here (on adaql1)

To check that HV cards are seen via RS232, go to /root/shim and run /shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux_bob

I'm running code in /root/shim because if it needs to make output, this filesystem allows. But the code coms from /shim which is the shared area.

A note: if this code needs to be recompiled, you must do the compilation on the shared disk area and then use the exec in the local area.

If when you run i2lchv_linux_bob you see a list of HV cards, then it's a good sign.

Next, run the Perl Server: go to /root/shim/ and type /shim/LecroyHV_shim/LecroyHV_Shim (Note, I've attempted to automate this step, so it is running 24/7 even after rebooting the intel PC. see below)

On adaql1, as adev account, do this: "cd ./slowc ; ./hvs LEFT"

How to modify programming on intelha3 or other intel PCs

On adaql1 as root, in /root/diskless/i386/Centos5.8/root/shim/LecroyHV_shim/LecroyHV_FE and related areas. Modify code and compile here.

On intelha3 the files appear as /shim/LecroyHV_shim/LecroyHV_FE (etc)

Attempt to automate the startup of Shim server so that users don't need to care.

On intelha3, have a crontab entry for root

[root@intelha3 LecroyHV_shim]# crontab -l

# Start the shim server for HV
2,10,20,30,40,50 * * * * /shim/scripts/prepHV

The script prepHV checks if Shim is running. If it is, do nothing. If not, start it.



Notes about pre-requisites for making the Perl server work.

Needed Perl 5.8.8 or later install. On intel PC this appears as /perl after being put (on adaql1) in the shared area.

Needed RH5 or greater, to have correct glibc

/dev/ttyS0 permissions -- it gets reset when boot.

Install telnetd server, and restart allow telnet in /etc/xinetd.d

yum install telnet-server

Kill cfengine, which tends to rather rudely delete /etc/xinetd.d/telnet Yes, telnet is an old, insecure protocol, but it's needed by this software. The Java code uses telnet to communicate with the Perl server.

/etc/xinetd.d/telnet needs "disable no"

On ahut1 or ahut2, I have a cron script /root/scripts/prepHV which takes care of running the server automatically. It also periodically restores the "telnet" file mentioned above, and periodically restarts xinetd

Simple test of telnet server: If you are on, for example, ahut2 and can "telnet ahut2" (i.e. telnet into yourself). Then yes, the server is running.

NOTE on getting Perl to work on a standard RHEL 6 i385 CUE level 2 install.

I had to install these modules to get Brad's shim server to work

rpm -ivf perl-IO-Tty-1.08-3.el6.i686.rpm

rpm -ivf perl-IO-Multiplex-1.13-1.el6.rf.noarch.rpm

rpm -ivf perl-Net-Server-0.97-7.el6.noarch.rpm

I think that's all I had to do to the default Perl installation on RHEL6, but there was more work needed to get the C code to compile

This is probably provided by some "devel" package, but whatever ...

root@hvsrv1 include]# pwd
/usr/include
[root@hvsrv1 include]# mkdir readline
[root@hvsrv1 include]# cd readline
[root@hvsrv1 readline]# scp highv@hvsrv2:/usr/include/readline/all.tar .
highv@hvsrv2's password: 
all.tar                                                                          100%   70KB  70.0KB/s   00:00    
[root@hvsrv1 readline]# tar xvf all.tar 
chardefs.h
history.h
keymaps.h
readline.h
rlconf.h
rlstdc.h
rltypedefs.h
tilde.h

Also:

yum install ncurses-devel-5.7-3.20090208.el6.i686

Also
[root@hvsrv1 lib]# pwd
/lib
[root@hvsrv1 lib]# ln -s libreadline.so.6.0 libreadline.so

Nov 13, 2013

Adding a new Intel VME PC to the hall

add to /etc/dhcpd.conf on adaql2

host halladaq8
              {
              filename "linux-install/pxelinux.0"; # File location relative to /tftpboot/
              next-server 129.57.164.32;            # TFTP server
#              hardware ethernet 00:20:38:04:6D:12; # Client MAC XVB601
#              hardware ethernet 00:20:CE:F6:03:EE; # Client MAC XVR14
#              hardware ethernet 00:21:70:AF:3B:46; # PC Magali LPC Clermont
              hardware ethernet 00:20:38:04:6D:0E; # Right Intel CPU HV control XVB601
              fixed-address 129.57.164.15;          # Client IP
              }
}

Then need to restart:

/etc/rc.d/init.d/dhcp restart

On adaql1

Need to add IP to adaql1:/root/tftpboot/linux-install/pxelinux.cfg

8139A40F is the hex version of 129.57.164.15 = halldaq8

What this file (8139A40F) looks like :

[root@adaql1 pxelinux.cfg]# pwd
/root/tftpboot/linux-install/pxelinux.cfg
[root@adaql1 pxelinux.cfg]# more 8139A40F
default Centos_5.8_32_bit

label Centos_5.8_32_bit
    kernel Centos_5.8_32_bit/vmlinuz
    append  initrd=Centos_5.8_32_bit/initrd.img root=/dev/ram0 init=disklessrc N
FSROOT=129.57.164.32:/diskless/i386/Centos5.8 ramdisk_size=23904 ETHERNET=eth0 S
NAPSHOT=intelha1 NISDOMAIN=CCHP vga=0x305 acpi=force vmalloc=256MB

Some things had to be enabled in BIOS, too. The network and the boot sequence -- Alex did it too fast.


Nov 14, 2013

Today I noticed that although intelha3 has /root from adaql1:/root/diskless/i386/Centos5.8/snapshot/intelha3/root/

halladaq8 does NOT have /root from adaql1:/root/diskless/i386/Centos5.8/snapshot/halladaq8/root/

Instead, halladaq8 has an independent /root which has nothing to do with adaql1. Wierd.


Jan 2, 2014

Working on clrlpc (DVCS setup) in ~/bobdev, I have made some tests to optimize the Shim server. The diffs are below. Here are some speed comparisons so far.

               Timing Tests for HV GUI

phase          time in seconds

             Portserver    Shim before mods    Shim after mods     

Initializing     17            100              25

Turn on HV       10            35               15

Enable and         5 + typing   30 + typing     5 + typing
set HV on 1 card

While this sounds nice, there is still a problem with the Shim software. There seems to be a probability "P" that a given command is ignored. "P" is small and hard to measure, but my impression is that before mods P = 0.05 and after the speedup mods, P = 0.1 or so. I have a few more ideas still to try.

Here are the diffs.

diff i2lchv_linux_bob.c~/slowc/ shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux_bob.c
17,19c17
< #define  SDELAY1    40000 /* checkACK delay (usec) */
< #define  SDELAY2    20000 /* pollDSR delay  (usec) */ 
< #define  SDELAY3    10000 /* main -- init loop over modules (usec) */
---
> #define  SDELAY     200000  /* in usecs (0.1 sec is stable for all but MC command (wants 0.2s), go figure) */
22c20
< #define  MAXPOLL    30    /* how many loops in pollDSR */
---
> #define  MAXPOLL    20
53c51
<   usleep(SDELAY1);
---
>   usleep(SDELAY);
99c97
<     usleep(SDELAY2);
---
>     usleep(SDELAY);
172c170
<     usleep(SDELAY3);  /* short wait for a response */
---
>     usleep(SDELAY);  /* short wait for a response */

and

dvcs@clrlpc.jlab.org> diff LecroyHV_Shim ~/slowc/shim/LecroyHV_shim/LecroyHV_Shim 
90c90
< my $PSUM_CACHE_TIME = 5;
---
> my $PSUM_CACHE_TIME = 10;
95c95
< my $GS_STALE_TIME = 25;
---
> my $GS_STALE_TIME = 15;

comments

1. The GS_STALE_TIME change seemed to alleviate the problem that the Shim got into a loop of GS being stale. Not sure if this is related to the probability "P" mentioned above.

2. I'm not sure if MAXPOLL should be adjusted. Making it very large causes "P" to increase, I think.

working ....


Update Jan 3, 2014

Brad emphasized that we should confirm that the DSR/busy line is being toggled in the "right" way -- never without a card being queried, does it go high after receiving a command that will generate a response ? or does it ever fail to go high ? Dpes it go low before the response is read out ?

I don't know how to check these things, but I assume: a) it could be checked with a scope; b) the DVCS hardware doesn't suffer from this since at the moment the crate works fine; c) not an issue with the crate if Jack did a good job; d) if a module is bad one should find it with low-level inquiries, then replace the module.

Brad also said that "updating stale GS" is nothing to worry about. Ok, good.

Latest set of changes is listed below. It is significantly faster. Note, reducing SDELAY1 appears to be risky.

Reliability issue: The probability "P" for a GUI operation to fail (i.e. simply not happen, requiring to try again) is zero for enabling / disabling a channel or setting a HV value. Good ! But might be an accident -- see below.

However, for turning on or off the crate, P was initially 50% today. I think I improved this by turning off some more printlog in the Shim, and I think the remaining small P (about 10%) is depending on print statements in the Java GUI, as I noticed that if you press HV ON or OFF when the GUI is printing, it may not do the transition. But if the GUI is not printing, the transition is reliable (P = 0). Need to look at that.

If you do succeed in turning on or off HV, it takes about 20 sec, which is 5x faster than before but still twice as long as the Portserver version.

dvcs@clrlpc.jlab.org> diff LecroyHV_Shim ~/slowc/shim/LecroyHV_shim/LecroyHV_Shim
86d85
< my $verbose = 0;   # = 1 print verbosely
91c90
< my $PSUM_CACHE_TIME = 5;
---
> my $PSUM_CACHE_TIME = 10;
96c95
< my $GS_STALE_TIME = 10;
---
> my $GS_STALE_TIME = 15;
247,249c246
<     if ( $verbose == 1 ) {
<        printlog ("[$id - $peer]: $message");
<     }
---
>     printlog ("[$id - $peer]: $message");
252,254c249
<     if ( $verbose == 1 ) {
<       printlog ("[R] '$response'");
<     }
---
>     printlog ("[R] '$response'");
476,478c471
<   if ( $verbose == 1 ) {
<     printlog("HV1458_handler: '$full_command'");
<   }
---
>   printlog("HV1458_handler: '$full_command'");
813,815c806
<     if ( $verbose == 1 ) {
<       printlog( "Updating stale GS.\n" );
<     }
---
>     printlog( "Updating stale GS.\n" );
843,845c834
<     if ( $verbose == 1 ) {
<       printlog( "Using cached psum($lslot)\n" );
<     }
---
>     printlog( "Using cached psum($lslot)\n" );
1058,1060d1046
< 
<   printlog("Response to poll_HV_status() = '$resp'");
< 
1111,1114c1097
<   if ( $verbose == 1 ) {
<     printlog "generic_cmd($cmd, $lslot)";
<   }
< 
---
>   printlog "generic_cmd($cmd, $lslot)";
1122,1124c1105
<   if ( $verbose == 1 ) {
<     printlog "remcmd: $remcmd";
<   }
---
>   printlog "remcmd: $remcmd";
1211,1213c1192
<     if ( $verbose == 1 ) {
<       printlog("prop: $prop\n");
<     }
---
>     printlog("prop: $prop\n");

and

diff i2lchv_linux_bob.c ~/slowc/shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux_bob.c
<  * bob's version with O_NONBLOCK and 3 values of SDELAY
<  *       to try to optimize the speed. 
<  * i2lchv_linux_bob.c
---
>  * bob's version with O_NONBLOCK 
> * i2lchv_linux_bob.c
18,20c17
< #define  SDELAY1    50000 /* checkACK delay (usec) <=20K causes failure*/
< #define  SDELAY2    5000 /* pollDSR delay  (usec) */ 
< #define  SDELAY3    10000 /* main -- init loop over modules (usec) */
---
> #define  SDELAY     200000  /* in usecs (0.1 sec is stable for all but MC command (wants 0.2s), go figure) */
23c20
< #define  MAXPOLL    50    /* how many loops in pollDSR */
---
> #define  MAXPOLL    20
54c51
<   usleep(SDELAY1);
---
>   usleep(SDELAY);
100c97
<     usleep(SDELAY2);
---
>     usleep(SDELAY);
173c170
<     usleep(SDELAY3);  /* short wait for a response */
---
>     usleep(SDELAY);  /* short wait for a response */

Jan 10, 2014

Disappointingly, deploying the above code in Hall A on HRS did not help. The logic of waiting by a fixed delay is wrong. Instead, we should check in a loop for "good data". Good data is defined as starting and ending with specific characters. I will try this next. The following e-mail from Javier Gomez explains.

Bob,

Not sure that I can explain what is going on but here is what I recall,

(1) if a module gets the ACK sequence:  geographic address, ACK (0x06), line-feed
it will attempt to dump the contents of its output buffer or just send the sequence: ACK, line-feed, carriage-return

(2) if a module gets a command, it responds
        (1st) ACK, line-feed, carriage-return  to acknowledge it got a command
        (2nd) starts preparing the command response - when it is ready, it raises the ATT* line which Linux has hard time handling as
interrupt
        (3rd) when the ATT* line goes up, the sequence: geographic address, ACK (0x06), line-feed needs to be sent for the module to
start transferring the data back
        (4th) the response , besides the ACK character it should also have a tag sent with the command so that it can be identified in
case that commands were concatenated.

(3) At the rate that the modules operate (38.4 kB), it takes 0.260 ms per character - a string of 40 characters, will take ~ 10.4 ms to
transfer.

Options to go around Linux not being interrupted easily,

(a) send a command and wait before proceeding - I think that in the very old test code I  used two waits: 10 ms & 100 ms (the equivalent
to SDELAY1)
        (1) 10 ms after sending a ACK sequence only (geographic address, ACK (0x06), line-feed)
        (2) 100 ms  guessing/waiting for the module to respond.
     this may have been reasonable for testing but if the module is late responding (e.g. busy dealing with what it thinks are changes
in the voltages due to noise)
then it may not answer in that window and things will get out of control because one is assuming the answer will be there. This is the
method used in the code you have.

b) another possibility somewhat more cumbersome which I did not try at that time is,
       (1) shortly after sending a command  (0.260 ms per character X number of characters), the addressed module should respond with
the ACK sequence (ACK\n\r) - 3 characters ~ 1 ms
        plus some time to check the command
        (2) you could then wait for some time, say 5 ms, and send the sequence: geographic address, ACK (0x06), line-feed  <== time for
module to get ready with answer
        (3) wait for say 10 ms (equivalent to ~ 40 characters to be transferred back from the module)
        (4) read the input buffer,
               (4.1) if the input buffer has the sequence (ACK\n\r), the response is not ready yet <== go back to (2) & you can skip the
5 ms because the sequence was received 10ms ago
               (4.2) if the buffer has something that starts with ACK followed by the unique tag you sent with the command, the response
has started. You then want to check
that the last two entries in the buffer are \n\r which indicate the module finished transferring the data. If they are not there, there
should be more data in the buffer or something got lost
       Depending on the system (e.g. assuming that linux really handles correctly the buffer swap) you should be able to reduce some of
the above time waits (e.g. get incomplete buffers that you can recognize because the sequence \n\r is not there). In principle, this
should be able to cope with modules having problems answering

Most of my tests were actually done not with Linux (the test code was from 2009) but with a realtime system so I was able to use
interrupts and the ATT* line. I did some tests
with a small board computer using Linux so that I could poll a bit to determine if the ATT* line has been raised  but I did not complete
them and I doubt I will get to it any time soon. One thing that I found in those tests however is that the default configuration of the
serial port seems to change with the Linux version & flavor. I do not think that this is your problem but here it is what I had to add
while setting the serial port attributes. You probably also want to make sure that the modem control lines are disabled to avoid any
noise, but this should already be true.

  rs232_attr.c_cc[VTIME]= 0; /* read - inter-character timer unused */
  rs232_attr.c_cc[VMIN] = 0; /* read - minimum number of characters */
 
Hope that it helps - Javier



Jan 13, 2014

I implemented the a new kind of check that replaces checkACK in the "loop forever" of Javier's C code. This loop is the ongoing dialog with the Perl Shim server. The new check is called bob1ACK. This code looks for a sequence "0x06 + [N char] + /n /r" where [N char] is N characters that may result from a reply of the card. In the simplest reply, N=0. This would be an acknowledgement (ACK) sequence. But for a reply, N can be larger, e.g. N=82. If a reply has not finished yet, you get no /n and /r, but only a truncated string. So, one must read again until the /n and /r are found, and one must concatenate the readings to form the reply. This is done now and seems to be fairly robust and fast.

A problem with both old and new code: sometimes if you do the operation (example: turn ON the crate) the command is ignored. The solution seems to be to change a Target value for the HV on one channel. This will "wake up" the code and it will do the operation (in this example: turn ON HV and also change the demand HV).

I wonder if this is due to the latencies built into the Shim server ? Indeed, playing with the $PSUM_CACHE_TIME and $GS_STALE_TIME seems to affect this. I think that "1" and "10" are ok.

I also think that GS_STALE_TIME should exceed PSUM_CACHE_TIME, or else the data may always be stale.

There is another cause for an operation to fail, but it looks like the Java GUI. You press "HV ON" and quickly "HV OFF", it will not change state. Most users won't do that, so I think it's no big deal.


Here are some speed data.


This is for 10 HV cards in the DVCS HV crate.  Times in seconds.
Old code with delay = 200K, as used on HRS now.  New code with bob1ACK

                   Old          New

start the GUI:      70           20

turn on/off HV      40           10

change a HV value   20           10
or enable/disable

It looks faster now.

Update Jan 17, 2014

In the past 2 days, I've been trying the new software on the R-HRS HV system. Remarkably, it did not work on the Intel PC halladaq8, but it did work on the ahut1 PC. I did not try yet on the Intel PC intelha3 because it's on the L-HRS and I don't want to interfere with detector checkout.

Investigating the failure, I found by using low-level queries of the HV crate that : 1) It always acknowledged a command within 10 msec (normal and good); and (2) it often did not give a data string within a reasonable amount of time, even 1 second; and (3) if I asked multiple times for data (using a write command), the cards would eventually give data which looked valid (changed with demand HV, etc).

Thus, as a "band-aid" solution, I made the query for data be a loop which tried (until a timeout) to get data by repeatedly writing the request and reading back. This worked, but with another twist: The sleep time needs to be reasonably big, like 0.2 msec, or you don't get data, no matter how often you do a write/read sequence or if you try to read repeatedly with small delays in between readings. So, there must be a sufficiently big delay between write and read.

And here's the kicker: This delay can be much smaller on ahut1 (and apparently on clrlpc, the DVCS PC). What's the difference between these computers ? Are Intel PCs "slow" somehow ?

Next, I made the delay depend on the index of the loop. Early in the loop, the code tries to be fast, and later in the loop it goes slower. The result is that code works on all PCs now, but is fast on ahut1 and clrlpc, but slow on halladaq8 (an XV8601 Intel VME-based PC). This has the ironic consequence of making the code even slower than before on halladaq8 since the first loop is wasted. Hmmm.....

Latest version of the frontend C code is i2lchv_linux_dev.c

I'll put a copy in /mss/home/rom/i2lchv_linux_dev_17jan13.c

I am pondering this mystery ....


Jan 17, 2014 (cont)

Three other notes, while I remember them.

1. The version on clrlpc (looks for data structure and is fast) also works on ahut1.

2. The "band-aid" version mentioned above (loops with successive write/read) works as a standalone C code (rs232 dialog) but NOT with the Java GUI -- it gives truncated strings which leads to an exception. This happens for servers on both hallada8 and ahut1.

3. The initialization phase on the top crate, which has 15 cards, can lose a card. Some cards are more easily lost than others, in particular the ones in slots 2, 4, and 7. However, if I make the sleep time 0.2 msec during initialization, all cards are seen. I saw no failure (lost card) in 25+ sessions. This is an adequate solution since the initialization is done once, and long before the Java GUI tries to connect. But it tells me something.


Jan 17, 2014 (3rd entry)

Version status:

On ahut1

i2lchv_linux_save17jan14.c = what we've used for the past few months

i2lchv_linux_slow.c = pretty much the same; this version was on Intel. (we'll need a unified version, obviously)

i2lchv_linux_dev_17jan14.c = "band-aid" development. Has a bob2ACK in addition to bob1ACK. This might be a place to start for further work.

i2lchv_linux_clrplc_16jan14.c = faster version developed on DVCS test stand. It also works on ahut1. I will use this for now.

i2lchv_linux_bob.c = working version at any given time. Compile with "build".

On halladaq8

i2lchv_linux_S0.c = working version for R-HRS halladaq8

i2lchv_linux_S2.c = working version for L-HRS intelha3 (DO NOT TOUCH, for now)

i2lchv_linux_slow.c = slow version which works on Intel PC. Same as i2lchv_linux_S2.c modulo /dev/tty*

i2lchv_linux_dev.c = "band-aid" development. See above. Has bob2ACK, etc.

Whew !


Jan 21, 2014

And another version

i2lchv_linux_dev_18jan14.c

but this and the other "dev" version didn't work well on ahut1. I think because some commands like LD only receive an ACK answer, and bob2 is looking for a longer string. The bob1 code is better and the "golden" version is

i2lchv_linux_clrplc_16jan14.c

We'll use this on laptops.

Another important issue: Brad Sawatzky tells me that the USB-to-Serial connections used on the Intel PCs are slow and buggy. It is much better to have a PCI-to-Serial card. I will look into getting those, but in the meantime I will deploy the laptops for the two HRS HV crates, and use Intel PCs as a backup only.


April 14, 2014

Lesson from first few months. A slower version of the il2chv_linux_* code is best for long-term stability. The software is in MSS in

/mss/home/rom/hv_ahut2_14apr14.tar

On RHEL6, I had to run some RPMs to get the Perl Shim server to work; explained above in the relevant section.


Adding PCI serial cards.

The Multi I/O card (2 port) that Brad gave me works. I had to get the driver from www.drivers-download.com and search for DL-0385102 and get the latest one. This was put into

hvsrv2:/home/highv/sdriver

then

[root@hvsrv2 Linux_v1.2]# pwd
/home/highv/sdriver/Linux_v1.2

[root@hvsrv2 Linux_v1.2]# ./sysbas_mpdrv.v12.sh 
              
================================================================
	SystemBase Multiport PCI/PCIe Board Installation	
	Version : 4.1	revision: 2010-01-29
		contact: tech@sysbas.com
================================================================
 1 board(s) installed 
 Board No.1 : Multi-2 PCI (rev b0)
	/dev/ttyMP0 (RS232 , 16C105X)
	/dev/ttyMP1 (RS232 , 16C105X)

and these two devices worked as alternative serial ports

Raspberry PI and all that -- Apr 30, 2014

Brad says we should be polling an ATTN line which Jack put to flow control (DSR, I think). Normal serial cables have TX, RX, and gnd. This is a 4th line which must be run.

The Raspberry PI should allow this too, but if we can already do it with PCs we should.

Apparently the beamline crate had this setup and was pretty fast.