Difference between revisions of "HV technical"
Line 461: | Line 461: | ||
<font color="blue">A problem with both old and new code: sometimes if you do the operation (example: turn ON the crate) the command is ignored. The solution seems to be to change a Target value for the HV on one channel. This will "wake up" the code and it will do the operation (in this example: turn ON HV and also change the demand HV).</font> | <font color="blue">A problem with both old and new code: sometimes if you do the operation (example: turn ON the crate) the command is ignored. The solution seems to be to change a Target value for the HV on one channel. This will "wake up" the code and it will do the operation (in this example: turn ON HV and also change the demand HV).</font> | ||
− | Here are some speed | + | Here are some speed data. |
<pre> | <pre> |
Revision as of 15:17, 13 January 2014
R. Michaels, Nov 2013
Here is some extremely boring technical info about HV control. If you need to get some sleep, read on.
First, note these somewhat old URLs as a starting point
http://hallaweb.jlab.org/equipment/daq/HVhelp.html
http://hallaweb.jlab.org/equipment/daq/gen_slow_control.html
The above URLs will need to be modified and updated to provide User guidance. But they are a nice start.
A note about the HV software architecture:
1. "hvs" (aka "hvg") = Java GUI that provides User interface (author: Roman Pomatsalyuk)
2. LecroyHV_Shim = Perl server, listens on a telnet port, talks to hvs. This code emulates the obsolete (no-longer-existing) motherboards on the LeCroy 1458 HV crate (author: Brad Sawatzky). This server must run on the PC that is connected by RS232 to the HV crate.
3. i2lchv_linux_bob = Bob's version of Brad's version of Javier's code that does the low-level communication with the HV crate. (original author: Javier Gomez, but don't blame him for my hacks). i2lchv_linux_bob talks via R232 to the HV cards using a primitive "language". The LecroyHV_Shim Perl script connects to i2lchv_linux_bob via the "Expect" module of Perl.
Nov 12, 2013. Attempts to use Intel PC as the PC that is nearby the LeCroy HV crate.
from root on adaql1, see /root/diskless/i386/Centos5.8
./snapshot = directories local to the cpus. e.g. see /snapshot/intelha3 for intelha3.
./root = directory shared by all the cpus. However, not (yet) writeable by the cpus.
On the intel cpu, if you want to create or touch files, you must use the local filesystem.
/root/diskless/i386/Centos5.8/snapshot/intelha3/root (from adaql1)
appears as
/root (local filesystem on intelha3)
/shim (on intelha3) is the shared shim directory from
However, if you want to compile code that every intel PC uses, you must use
/root/diskless/i386/Centos5.8/root on adaql1
/shim (on intelha3) is the shared shim directory from /root/diskless/i386/Centos5.8/root/shim on adaql1
Nov 12, 2013. Attempting to run HV control on L-HRS.
Login to intelha3 as root.
Reminder /root/shim is the LOCAL copy. Must run Shim server here because this filesystem can be written to. /shim is the SHARED copy (on adaql1). Must compile here (on adaql1)
To check that HV cards are seen via RS232, go to /root/shim and run /shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux_bob
I'm running code in /root/shim because if it needs to make output, this filesystem allows. But the code coms from /shim which is the shared area.
A note: if this code needs to be recompiled, you must do the compilation on the shared disk area and then use the exec in the local area.
If when you run i2lchv_linux_bob you see a list of HV cards, then it's a good sign.
Next, run the Perl Server: go to /root/shim/ and type /shim/LecroyHV_shim/LecroyHV_Shim (Note, I've attempted to automate this step, so it is running 24/7 even after rebooting the intel PC. see below)
On adaql1, as adev account, do this: "cd ./slowc ; ./hvs LEFT"
How to modify programming on intelha3 or other intel PCs
On adaql1 as root, in /root/diskless/i386/Centos5.8/root/shim/LecroyHV_shim/LecroyHV_FE and related areas. Modify code and compile here.
On intelha3 the files appear as /shim/LecroyHV_shim/LecroyHV_FE (etc)
Attempt to automate the startup of Shim server so that users don't need to care.
On intelha3, have a crontab entry for root
[root@intelha3 LecroyHV_shim]# crontab -l
# Start the shim server for HV 2,10,20,30,40,50 * * * * /shim/scripts/prepHV
The script prepHV checks if Shim is running. If it is, do nothing. If not, start it.
Notes about pre-requisites for making the Perl server work.
Needed Perl 5.8.8 or later install. On intel PC this appears as /perl after being put (on adaql1) in the shared area.
Needed RH5 or greater, to have correct glibc
/dev/ttyS0 permissions -- it gets reset when boot.
Install telnetd server, and restart allow telnet in /etc/xinetd.d
yum install telnet-server
Kill cfengine, which tends to rather rudely delete /etc/xinetd.d/telnet Yes, telnet is an old, insecure protocol, but it's needed by this software. The Java code uses telnet to communicate with the Perl server.
/etc/xinetd.d/telnet needs "disable no"
On ahut1 or ahut2, I have a cron script /root/scripts/prepHV which takes care of running the server automatically. It also periodically restores the "telnet" file mentioned above, and periodically restarts xinetd
Simple test of telnet server: If you are on, for example, ahut2 and can "telnet ahut2" (i.e. telnet into yourself). Then yes, the server is running.
Nov 13, 2013
Adding a new Intel VME PC to the hall
add to /etc/dhcpd.conf on adaql2
host halladaq8 { filename "linux-install/pxelinux.0"; # File location relative to /tftpboot/ next-server 129.57.164.32; # TFTP server # hardware ethernet 00:20:38:04:6D:12; # Client MAC XVB601 # hardware ethernet 00:20:CE:F6:03:EE; # Client MAC XVR14 # hardware ethernet 00:21:70:AF:3B:46; # PC Magali LPC Clermont hardware ethernet 00:20:38:04:6D:0E; # Right Intel CPU HV control XVB601 fixed-address 129.57.164.15; # Client IP } }
Then need to restart:
/etc/rc.d/init.d/dhcp restart
On adaql1
Need to add IP to adaql1:/root/tftpboot/linux-install/pxelinux.cfg
8139A40F is the hex version of 129.57.164.15 = halldaq8
What this file (8139A40F) looks like :
[root@adaql1 pxelinux.cfg]# pwd /root/tftpboot/linux-install/pxelinux.cfg [root@adaql1 pxelinux.cfg]# more 8139A40F default Centos_5.8_32_bit label Centos_5.8_32_bit kernel Centos_5.8_32_bit/vmlinuz append initrd=Centos_5.8_32_bit/initrd.img root=/dev/ram0 init=disklessrc N FSROOT=129.57.164.32:/diskless/i386/Centos5.8 ramdisk_size=23904 ETHERNET=eth0 S NAPSHOT=intelha1 NISDOMAIN=CCHP vga=0x305 acpi=force vmalloc=256MB
Some things had to be enabled in BIOS, too. The network and the boot sequence -- Alex did it too fast.
Nov 14, 2013
Today I noticed that although intelha3 has /root from adaql1:/root/diskless/i386/Centos5.8/snapshot/intelha3/root/
halladaq8 does NOT have /root from adaql1:/root/diskless/i386/Centos5.8/snapshot/halladaq8/root/
Instead, halladaq8 has an independent /root which has nothing to do with adaql1. Wierd.
Jan 2, 2014
Working on clrlpc (DVCS setup) in ~/bobdev, I have made some tests to optimize the Shim server. The diffs are below. Here are some speed comparisons so far.
Timing Tests for HV GUI phase time in seconds Portserver Shim before mods Shim after mods Initializing 17 100 25 Turn on HV 10 35 15 Enable and 5 + typing 30 + typing 5 + typing set HV on 1 card
While this sounds nice, there is still a problem with the Shim software. There seems to be a probability "P" that a given command is ignored. "P" is small and hard to measure, but my impression is that before mods P = 0.05 and after the speedup mods, P = 0.1 or so. I have a few more ideas still to try.
Here are the diffs.
diff i2lchv_linux_bob.c~/slowc/ shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux_bob.c 17,19c17 < #define SDELAY1 40000 /* checkACK delay (usec) */ < #define SDELAY2 20000 /* pollDSR delay (usec) */ < #define SDELAY3 10000 /* main -- init loop over modules (usec) */ --- > #define SDELAY 200000 /* in usecs (0.1 sec is stable for all but MC command (wants 0.2s), go figure) */ 22c20 < #define MAXPOLL 30 /* how many loops in pollDSR */ --- > #define MAXPOLL 20 53c51 < usleep(SDELAY1); --- > usleep(SDELAY); 99c97 < usleep(SDELAY2); --- > usleep(SDELAY); 172c170 < usleep(SDELAY3); /* short wait for a response */ --- > usleep(SDELAY); /* short wait for a response */
and
dvcs@clrlpc.jlab.org> diff LecroyHV_Shim ~/slowc/shim/LecroyHV_shim/LecroyHV_Shim 90c90 < my $PSUM_CACHE_TIME = 5; --- > my $PSUM_CACHE_TIME = 10; 95c95 < my $GS_STALE_TIME = 25; --- > my $GS_STALE_TIME = 15;
comments
1. The GS_STALE_TIME change seemed to alleviate the problem that the Shim got into a loop of GS being stale. Not sure if this is related to the probability "P" mentioned above.
2. I'm not sure if MAXPOLL should be adjusted. Making it very large causes "P" to increase, I think.
working ....
Update Jan 3, 2014
Brad emphasized that we should confirm that the DSR/busy line is being toggled in the "right" way -- never without a card being queried, does it go high after receiving a command that will generate a response ? or does it ever fail to go high ? Dpes it go low before the response is read out ?
I don't know how to check these things, but I assume: a) it could be checked with a scope; b) the DVCS hardware doesn't suffer from this since at the moment the crate works fine; c) not an issue with the crate if Jack did a good job; d) if a module is bad one should find it with low-level inquiries, then replace the module.
Brad also said that "updating stale GS" is nothing to worry about. Ok, good.
Latest set of changes is listed below. It is significantly faster. Note, reducing SDELAY1 appears to be risky.
Reliability issue: The probability "P" for a GUI operation to fail (i.e. simply not happen, requiring to try again) is zero for enabling / disabling a channel or setting a HV value. Good ! But might be an accident -- see below.
However, for turning on or off the crate, P was initially 50% today. I think I improved this by turning off some more printlog in the Shim, and I think the remaining small P (about 10%) is depending on print statements in the Java GUI, as I noticed that if you press HV ON or OFF when the GUI is printing, it may not do the transition. But if the GUI is not printing, the transition is reliable (P = 0). Need to look at that.
If you do succeed in turning on or off HV, it takes about 20 sec, which is 5x faster than before but still twice as long as the Portserver version.
dvcs@clrlpc.jlab.org> diff LecroyHV_Shim ~/slowc/shim/LecroyHV_shim/LecroyHV_Shim 86d85 < my $verbose = 0; # = 1 print verbosely 91c90 < my $PSUM_CACHE_TIME = 5; --- > my $PSUM_CACHE_TIME = 10; 96c95 < my $GS_STALE_TIME = 10; --- > my $GS_STALE_TIME = 15; 247,249c246 < if ( $verbose == 1 ) { < printlog ("[$id - $peer]: $message"); < } --- > printlog ("[$id - $peer]: $message"); 252,254c249 < if ( $verbose == 1 ) { < printlog ("[R] '$response'"); < } --- > printlog ("[R] '$response'"); 476,478c471 < if ( $verbose == 1 ) { < printlog("HV1458_handler: '$full_command'"); < } --- > printlog("HV1458_handler: '$full_command'"); 813,815c806 < if ( $verbose == 1 ) { < printlog( "Updating stale GS.\n" ); < } --- > printlog( "Updating stale GS.\n" ); 843,845c834 < if ( $verbose == 1 ) { < printlog( "Using cached psum($lslot)\n" ); < } --- > printlog( "Using cached psum($lslot)\n" ); 1058,1060d1046 < < printlog("Response to poll_HV_status() = '$resp'"); < 1111,1114c1097 < if ( $verbose == 1 ) { < printlog "generic_cmd($cmd, $lslot)"; < } < --- > printlog "generic_cmd($cmd, $lslot)"; 1122,1124c1105 < if ( $verbose == 1 ) { < printlog "remcmd: $remcmd"; < } --- > printlog "remcmd: $remcmd"; 1211,1213c1192 < if ( $verbose == 1 ) { < printlog("prop: $prop\n"); < } --- > printlog("prop: $prop\n");
and
diff i2lchv_linux_bob.c ~/slowc/shim/LecroyHV_shim/LecroyHV_FE/i2lchv_linux_bob.c < * bob's version with O_NONBLOCK and 3 values of SDELAY < * to try to optimize the speed. < * i2lchv_linux_bob.c --- > * bob's version with O_NONBLOCK > * i2lchv_linux_bob.c 18,20c17 < #define SDELAY1 50000 /* checkACK delay (usec) <=20K causes failure*/ < #define SDELAY2 5000 /* pollDSR delay (usec) */ < #define SDELAY3 10000 /* main -- init loop over modules (usec) */ --- > #define SDELAY 200000 /* in usecs (0.1 sec is stable for all but MC command (wants 0.2s), go figure) */ 23c20 < #define MAXPOLL 50 /* how many loops in pollDSR */ --- > #define MAXPOLL 20 54c51 < usleep(SDELAY1); --- > usleep(SDELAY); 100c97 < usleep(SDELAY2); --- > usleep(SDELAY); 173c170 < usleep(SDELAY3); /* short wait for a response */ --- > usleep(SDELAY); /* short wait for a response */
Jan 10, 2014
Disappointingly, deploying the above code in Hall A on HRS did not help. The logic of waiting by a fixed delay is wrong. Instead, we should check in a loop for "good data". Good data is defined as starting and ending with specific characters. I will try this next. The following e-mail from Javier Gomez explains.
Bob, Not sure that I can explain what is going on but here is what I recall, (1) if a module gets the ACK sequence: geographic address, ACK (0x06), line-feed it will attempt to dump the contents of its output buffer or just send the sequence: ACK, line-feed, carriage-return (2) if a module gets a command, it responds (1st) ACK, line-feed, carriage-return to acknowledge it got a command (2nd) starts preparing the command response - when it is ready, it raises the ATT* line which Linux has hard time handling as interrupt (3rd) when the ATT* line goes up, the sequence: geographic address, ACK (0x06), line-feed needs to be sent for the module to start transferring the data back (4th) the response , besides the ACK character it should also have a tag sent with the command so that it can be identified in case that commands were concatenated. (3) At the rate that the modules operate (38.4 kB), it takes 0.260 ms per character - a string of 40 characters, will take ~ 10.4 ms to transfer. Options to go around Linux not being interrupted easily, (a) send a command and wait before proceeding - I think that in the very old test code I used two waits: 10 ms & 100 ms (the equivalent to SDELAY1) (1) 10 ms after sending a ACK sequence only (geographic address, ACK (0x06), line-feed) (2) 100 ms guessing/waiting for the module to respond. this may have been reasonable for testing but if the module is late responding (e.g. busy dealing with what it thinks are changes in the voltages due to noise) then it may not answer in that window and things will get out of control because one is assuming the answer will be there. This is the method used in the code you have. b) another possibility somewhat more cumbersome which I did not try at that time is, (1) shortly after sending a command (0.260 ms per character X number of characters), the addressed module should respond with the ACK sequence (ACK\n\r) - 3 characters ~ 1 ms plus some time to check the command (2) you could then wait for some time, say 5 ms, and send the sequence: geographic address, ACK (0x06), line-feed <== time for module to get ready with answer (3) wait for say 10 ms (equivalent to ~ 40 characters to be transferred back from the module) (4) read the input buffer, (4.1) if the input buffer has the sequence (ACK\n\r), the response is not ready yet <== go back to (2) & you can skip the 5 ms because the sequence was received 10ms ago (4.2) if the buffer has something that starts with ACK followed by the unique tag you sent with the command, the response has started. You then want to check that the last two entries in the buffer are \n\r which indicate the module finished transferring the data. If they are not there, there should be more data in the buffer or something got lost Depending on the system (e.g. assuming that linux really handles correctly the buffer swap) you should be able to reduce some of the above time waits (e.g. get incomplete buffers that you can recognize because the sequence \n\r is not there). In principle, this should be able to cope with modules having problems answering Most of my tests were actually done not with Linux (the test code was from 2009) but with a realtime system so I was able to use interrupts and the ATT* line. I did some tests with a small board computer using Linux so that I could poll a bit to determine if the ATT* line has been raised but I did not complete them and I doubt I will get to it any time soon. One thing that I found in those tests however is that the default configuration of the serial port seems to change with the Linux version & flavor. I do not think that this is your problem but here it is what I had to add while setting the serial port attributes. You probably also want to make sure that the modem control lines are disabled to avoid any noise, but this should already be true. rs232_attr.c_cc[VTIME]= 0; /* read - inter-character timer unused */ rs232_attr.c_cc[VMIN] = 0; /* read - minimum number of characters */ Hope that it helps - Javier
Jan 13, 2014
I implemented the a new kind of check that replaces checkACK in the "loop forever" of Javier's C code. This loop is the ongoing dialog with the Perl Shim server. The new check is called bob1ACK. This code looks for a sequence "0x06 + [N char] + /n /r" where [N char] is N characters that may result from a reply of the card. In the simplest reply, N=0. This would be an acknowledgement (ACK) sequence. But for a reply, N can be larger, e.g. N=82. If a reply has not finished yet, you get no /n and /r, but only a truncated string. So, one must read again until the /n and /r are found, and one must concatenate the readings to form the reply. This is done now and seems to be fairly robust and fast.
A problem with both old and new code: sometimes if you do the operation (example: turn ON the crate) the command is ignored. The solution seems to be to change a Target value for the HV on one channel. This will "wake up" the code and it will do the operation (in this example: turn ON HV and also change the demand HV).
Here are some speed data.
This is for 10 HV cards in the DVCS HV crate. Times in seconds. Old code with delay = 200K, as used on HRS now. New code with bob1ACK Old New start the GUI: 70 20 turn on/off HV 40 10 change a HV value 20 10 or enable/disable It looks faster now.