Notes about DAQ Deadtime in Hall A


                             R. Michaels, JLAB, Oct 97
                             updated below, April 1998

  In this note I attempt to explain how to calculate
the deadtime corrections for normalizing the data.
(This is a really old note, but perhaps still useful.)

  I only consider DAQ, i.e. computer, deadtime, which
at present is on the order of 700 microseconds {note added
in April 98: this is now 300 microseconds} in the 
best case of our CODA 2.0 implementation.  Other deadtimes,
e.g. electronic deadtimes (measured to be of order 100 ns)
and detector deadtimes (unmeasured as yet but probably 
less than 1 microsecond) are NOT discussed and are also
not very important unless the single arm rate is of order
1 Mhz.

  We distinguish two problems: 1) How to calculate
the overall deadtime, which is a number people on shift
try to keep below about 30%, and 2) How to calculate
the deadtime correction for a particular trigger type,
which of course is mathematically different than 1).
  The deadtime correction is defined from the probability
that a trigger would have been accepted.  With scalers
we count how many triggers of various types were created,
and from the datastream we can count how many trigger
types were accepted.  The ratio of the sum of triggers
accepted to the sum of trigger which tried to trigger the
DAQ is the "livetime" ratio (summed over trigger types), 
and the deadtime is simply 1-livetime.  This is what
is displayed in the deadtime monitor window.
  For a particular trigger type, say T5, a deadtime 
correction must be applied to get the cross section 
for that trigger type.  This is again the ratio of 
the number of T5 triggers we accepted to the number 
we tried to accept.  The number we tried to accept 
is the number of T5 measured in the scalers during
the run divided by the prescale factor for T5. The 
number we did accept is counted in an obvious way
from the datastream.
  The subtleties in the deadtime correction formula
stem from the way the scalers work and from the way
the trigger supervisor works.  There are also a potential
subtlety in that, depending on how the timing is set
up, a trigger "N" could actually be a trigger "M".  This
would be a mistake, e.g. from having a wrong trigger
file, but there is no great danger in such a mistake 
since all frontend modules are read out for all trigger 
types, so if you can tell from the event topology what 
kind of trigger it was, you can analyze it accordingly.
  The scalers simply count the number of triggers of
each type which were produced by the system.  Each 
channel of scaler is independent, and so the triggers
are counted independently and with no prescaling.  
  The trigger supervisor (TS) has a prescale function and
a memory lookup (MLU) function.  When the signals arrive
at the input, the prescale function acts FIRST.  A 
prescale factor of N means the trigger supervisor ignores
N-1 triggers until the Nth one comes along.  The MLU
function acts upon the prescaled trigger inputs.  In a
given time interval of "live" time (i.e. when the DAQ
can accept a trigger), the first trigger to arrive may
trigger the system, and then it is dead for typically
700 microseconds.  However, if a second trigger (or more)
signal arrives on a different input of the TS within 
10 nsec of the first, an overlap occurs and the TS in 
its present state of programming creates a trigger 14, 
seen as "event type 14" in the language of ESPACE.  
The 10 nsec is a hardware feature of the TS in its 
current state.
   Lets give a few simple examples: Example 1) We have
two trigger types T1, T2 which are timed to arrive at 
the TS within 10 nsec of each other, and the prescale 
factors ware 1 and 10 respectively.  Assuming the rate 
was very low compared to the (1/deadtime), the first
9 trigger types in the data would be T1, and the 10th 
would be trigger 14 because of the overlap programming.
Example 2): We have three trigger types 1,3,5 where T5 is 
forced to occur 30 nsec before T1 and T3, but (T1,T3)
are overlapping within 10 nsec.  Lets say that every
time a T5 appears at the input (before prescaling), that
T1 and T3 also appear at the input (before prescaling).
Let us say that the prescale factors for T5 are 1 and
for T1 and T3 they are respectively 12 and 100.  In this
case, we'd always see a T5 and never see T1,T3 -- unless 
there were additional T1 and T3 which did not belong to 
a T5.  [This example probably sounds familiar from our 
experimental setup where T5 is formed externally to the 
TS and is the "and" of T1 and T3.]  So, if there were 
also T1,T3 which were not correlated with T5, we'd see
every 1 in 12 T1, and 1 in 100 T3.  Example 3): Lets take
example 2 and modify it slightly -- let the prescale factor
for T5 be 2.  Then we'd only see every second T5.  But,
the T1,T3 overlap with 10 nsec, so we'd start seeing T14.
How many would we see ?  Since the MLU acts after the
prescaling, the T14 rate observed in the datastream would be:

               (p5-1)*T5
      T14 =   ------------  * Z * lt
                p5*p1*p3

  where T5 is the rate of T5 (scaler), and p1,p3,p5
are the prescale factors, and Z is the probability
to overlap within 10 nsec (Z=1 in this example),
and "lt" is the livetime = 1-deadtime.
  Note that this formula is not correct if the ratio
p3/p1 or p1/p3 are an integer.  E.g. if p1=10 and 
p3=100 there will be overlaps for every T3.  In that
case  T14 = (p5-1)*T5*lt / (p5*p3).
  We point out that the T14 can come from several
origins: 1) From the mechanism described in example 3,
where T1 and T3 are correlated to T5;  2) From the
setup of timing, where inputs arrive within 10 nsec 
sometimes or all the time.  
  Let us consider the second case.  In March of this
year, during commissioning, the T1 did come within 10 nsec
of T5 with a certain probability Z, which was 1 for pulser
setup and probably near 1 for beam. The result was that 
in the datastream we counted T5 and T14:

   T14-in-data = [ (Z/p1) * (T5/p5) ] * lt 
   T5-in-data =  [ (1 - Z/p1) * (T5/p5) ] * lt
   (Note if Z=1 and p1=1, we would only see T14)

  where lt = "livetime" = 1 - deadtime, T5 is the 
un-prescaled rate (measured with scalers), p1,p5 are
prescale factors.  These T14 should be analyzed the 
SAME way as T5.  So adding up the number of T14 and T5 
seen by the analysis, one gets
   
   T14-in-data + T5-in-data = lt * T5 / p5.
   
One can solve for lt, the correction to the cross section.
   If T3 also overlaps, the formulas are more complicated.
In general, I have tried to avoid all these complications
by forcing T5 to arrive 30 nsec before T1,T3.  This, 
combined with the fact that p1 and p3 were usually much
larger than p5, should make it rare to see T14 from this 
cause. One still sees T14, however, when p5 is not 1.  
I would say that if one sees T14 when p5=1, then it is 
due to the aforementioned overlaps of T1 with T5, or of
T3 with T5.
   Another subtlety: If T1 or T3 came >10 nsec before T5, 
it can "kill" T5.  If these singles triggers were the 
origin of the T5 they killed, it would potentially be 
a source of inefficiency, unless you recognize the 
singles trigger was a coincidence from the data (e.g. 
coincidence time peak).  
   Let us now consider the deadtime summed over
trigger types (which is NOT the same as the deadtime
correction of a particular trigger type) and for the 
case where T5 is split from T1,T3 but T1 and T3 overlap. 
We also have T2 and T4 which are exclusive of T1,T3,T5,
so they normally don't overlap at the TS (at least
we'll assume they don't).  Then the deadtime summed
over trigger types is
   
  1 - [TSout / (x1 + x2 + x3 + x4 + x5 + 
                xoverlap - x1corr - x3corr)]

  TSout = observed DAQ rate
  x1 = T1/p1, x2 = T2/p2, x3 = T3/p3, 
  x4 = T4/p4, x5 = T5/p5
  T1,T2,T3,T4,T5 = trigger rates (measured with scalers)
  if the ratios p3/p1 and p1/p3 are not integers:
    xoverlap = T5*Z*(p5-1)/(p1*p3*p5) = the T14 rate
  else if one of the ratios is an integer
    xoverlap = T5*Z*(p5-1)/(p5*max(p1,p3)) = the T14 rate
  Z = probability of T1 and T3 to overlap (about 1)
  x1corr = (x5 + xoverlap)/p1
  x3corr = (x5 + xoverlap)/p3
 
   The correction factors x1corr and x3corr need 
explaining. For every T5 there is also a T1,T3,
but they are discarded because T5 takes precedence.
This discarding occurs in addition to the prescaling.
The scaler rates T1,T3 therefore over-count the number
of times T1,T3 tried to trigger the system.  The 
probability for T5 to overlap with a Ti (i=1 or 3) 
which would NOT have been discarded by its own prescale 
factor pi, is 1/pi.  The scaler counts of T1 and T3
also have to be corrected for the T14 overlaps, which
leads to the xoverlap component of x1corr, x3corr.

  We close with a few points to emphasize:

  1) If p5=1, and p1,p2,p3,p4 >> p5, the deadtime
     summed over trigger types is a quite simple
     formula (which people used for a long time) and
     there are few T14.  This was the case for most 
     of the beam time.

  2) It is important to realize that the correction
     factor for a trigger T5, for example, is NOT
     the deadtime summed over triggers.  It is simply
     the fraction of T5 which were accepted from the
     number of T5 which tried to trigger the system.
     We know how many were accepted from counting them
     in the data file, and we know how many triggers
     we had from the scalers.  Thus, the livetime of T5:
          livetime = lt =  (T5-in-data)/(T5/p5)
     As already mentioned, there are subtleties if the
     T5 are not split to come well before the T1,T3. 
     e.g. if T5 overlaps with one of the other trigger 
     types, leading to T14 (seen when p5 is not 1). 
     I have already explained one such case (T1,T5 
     overlap) where
        T14-in-data + T5-in-data = lt * T5 / p5.
     If T5 is setup to come too late, one may also
     see T1 or T3 which are actually T5.

  3) Paul Ulmer has suggested to have the MLU trigger
     patterns latched in the datastream.  We found out
     Oct 28, 97 that this was easy to do since the TS has
     a set of 12 outputs corresponding to the 12 trigger
     types, so we can put these into a TDC to latch the
     trigger pattern. So, if a T5 is accompanied by a
     prescaled T3, we'll see channel 5 and 3 in the TDC.
     This way we can sort out the prescaled singles triggers,
     even if they belong to a coincidence trigger.  It also
     simplifies the deadtime formulas.  This scheme is
     implemented as of Oct 28, 97.  [The TDC is a model
     1877, 5 microsecond range, multihit with max of 6 hits
     per channel; the signals are at about 1/2 range, i.e channel
     2500.  Since a channel can have up to 6 hits per event, 
     the true hit should always be visible.  At present,
     the TDC is "spare7 detector" in the detector map.
     Channels 1-12 of spare7 are the triggers 1-12.
     The triggers arrive at the TDC AFTER prescaling.]

  4) Although I'm not going to rewrite this page, I will
     offer my lone update for 1999:  We have two new ways
     to compute deadtime from scalers.  Two scaler channels
     count signals gated by DAQ "not busy" pulse.  The signals
     are Q10 and clock.  Q10 is the charge signal from V-to-F
     and clock is the 1024 Hz clock.  Since these signals are
     scaled in other channels, gated by the run, the ratio
     to these new channels should give the deadtime correction.
     It is worth trying early in the experiment.


---------  Checks on Deadtime and Other Problems ------------

  Since writing the above epic tome, there have been a 
few developments.  We added in the scalers a channel that
counts the random T5.  The electron arm signal was delayed
several microseconds (I think its about 3 microseconds) and
AND'd with the hadron arm signal, and this stuffed into a 
scaler.  Another development was that the E89003 experiment
has observed for several runs that their deadtime correction
for T5 is much larger than that for T1 or T3, and is typically
40% even at fairly low rates.  The effect is not yet understood.
The danger this poses to normalizing an experiment illustrates
why it is very important for experiments to check the following
during an experiment.  If all this works out, you probably can
calculate the deadtime correctly.

  1. Scalers Working ?
     Check that there are no large asymmetries in rates,
     apart from what might be expected.  Check that the
     sum of rates on 6 scintillator paddles is approximately
     the trigger rate, and the S1 and S2 plane have about
     the same rates (within factors expected by geometry).
     Not only the online scaler display but the offline 
     record of scalers should be checked.  

  2. Cross Section ?
     A rough 20% check of cross section should be done using
     scaler information and charge.  

  3. Deadtimes the same for every event type ?
     The % deadtime for T1,T3, and T5 should be the same.
     This can be checked by counting how many of each event
     type appear in the CODA file and how many are expected
     from scalers and prescale factors.  Be careful -- CODA
     writes to a data file but when you read this there are
     many ways to artificially reject events and miscount.
     For example, the "missing gate" and "extra gate" events
     are rejected by ESPACE before any histograms are incremented.
     Look at the raw data !

  4. The TDC that latches trigger pattern makes sense ?
     The TDC trigger pattern (note 3, Paul Ulmer's idea, see
     above) could be used to explicitly sort the events that 
     belong to two trigger types (e.g. a T5 accompanied by a T3).
     and calculate deadtimes directly because you know precisely
     how many of each event type was accepted and there are
     no ambiguities from overlaps.  Also, the event type should
     be correlated in a sensible way to the trigger pattern.
     For most events the event type will simply equal the
     pattern number.  But there may also be triggers 1 or 3
     which accompany trigger 5, for example.  Some combinations
     are illegal, for example an event type 2 should not be
     accompanied by a trigger 1, since they are exclusive.
     But be careful -- the TDC that latches this pattern has
     a 5 microsecond window.  There are random overlaps.  Its
     necessary to look at the TDC distribution, recognize the
     region of real events and apply a cut.

  5. Sum rule for TS-out
     The scaler channel counting accepted triggers should equal the 
     sum of event types 1-12, 14 in the CODA file (in practice there 
     are only types 1-5, 8, and 14 at present).

  6. Check of randoms
     A useful check that T1, T3, and T5 make sense is to compute
     the random T5 rates as many ways as possible.  Examples of
     things that can (and have) gone wrong: a) An electronics channel
     is wrongly terminated, the reflections cause double-counting;
     b) PMT's double pulse with some probability; c) A piece of
     electronics fails affecting one rate but not the others.

     There are presently 3 ways to check the randoms.  They should
     agree !

       a) Calculate them from T1 and T3 rates
       b) Calculate them from T5 rates and the TC spectrum.
          You simply correct T5 for the ratio of trues to accidentals.
       c) There is a scaler channel that counts the random
          T5 directly.  The E-arm signal and H-arm signal
          are AND'd, and the E-arm signal has a ~3 microsec
          delay, so the AND's are random.


  ------- Causes of Deadtime   ----------------------------

  Here are some causes of DAQ deadtime.  I'll classify them
  as "under control" and "hard to predict".  The "under control"
  deadtimes can probably be measured and understood with scope
  measurements.  The "hard to predict" deadtimes come and go
  and may be impossible to predict from features of your data
  such as event size.  

  (See also the FAQ in dtime_faq.html.)

  I. Deadtimes which are under control

  1. Fastbus readout (fortunately the dominant one with CODA 2.0)

         Can be measured on a scope, was of order 1 - 1.4 msec
     for CODA 1.4 (my dim recollection).  Will depend on event size.
     With the 5 microsec (presently 1.5) window on TDC-1877, there will 
     be an event-size effect at 10's of kHz.  Will be different for 
     different crates and event types.  

  II.  Deadtimes which are hard to predict

  1. Network
   
       This is partly under control, when the network is behaving,
       because it should be measurable.  The unpredictability comes 
       from the fact that it was a competing deadtime for CODA 1.4
       and it was sometimes bad, but not known why.  This is much 
       less a problem with CODA 2.0 setup.

  2. Disk I/O
 
       When reading rapidly from the same partition to which we
       are writing, the disk head jumps around.  It can cause a
       significant deadtime.  (We avoid this now.)

  3. Online Analyzer / Event Builder

       A main possibility is to have too many processes on the
       workstation which causes swapping of event builder from memory.  
       We've gotten better at avoiding these problems.

  4. Various Wrongly Set Parameters in CODA

       This did not affect E89003, but it happened recently.



  --------  Monte Carlo by M. Liang ---------

  A Monte Carlo was written by M. Liang for the E89003 experiment
  which calculates the deadtime corrections.  See 

       http://www.jlab.org/~mliang/deadtime_cor/

  and ask Meme for details.

This page maintained by rom@jlab.org