NEXT
Make New Entry,
Make Followup Entry
User name R. Michaels
Log entry time 11:12:13 on November 9,2009
Entry number 298554
This entry is a followup to: 298455
keyword=re: xscaler problems (run ONE copy, please !)
Last night it was found by Alex that removing the scaler server
(used by xscaler) made the DAQ work. He reminded me of the problem
of vxWorks running out of sockets -- this can happen if there
are too many copies of xscaler running. (Recall for example
halog 268481 from April 25). I consider this a bug in vxWorks:
sockets are held for a time "T" which is suprisingly long.
If the clients ask for data at a rate that exceeds 1/T the kernel
runs out of sockets. The network interface freezes and CODA fails.
I found and killed 6 copies of xscaler on apar account on various
computers. Please run only one copy !! Let's agree to run it
on adaql5.
Next I ran this at vxworks prompt:
-> netStackSysPoolShow
It shows the Cluster Pool Table has only 64 clusters on ROC28
(pvdis1) whereas on ROC29 (pvdis2) there are 512. Old problem !
Running xscaler (or screadR) many times will eat up the clusters.
A symptom is that the "number of times failed to find space"
is nonzero. And "xscaler" or "screadR" will not return data.
ROC28 is a 2400 cpu with only has 32 MB, so recompiling the
kernel with more clusters is not recommended. (ROC29 is a
5100 cpu with 512 MB and 512 clusters.) From experience we
know that 32MB + 64 clusters is marginal while 512MB + 512clusters
is adequate.
Just now I ran CODA and one copy of xscaler for each arm.
The number of free clusters and network failure rate was stable
on both arms. I think this is a good way to run. Meanwhile I'll
hunt for a 5100 board in case we decide to swap out ROC28.
Alex's workaround is a good idea (makes xscaler not work but
CODA does work) -- in case of emergency.