Notes about DAQ Deadtime in Hall A

R. Michaels, JLAB, Oct 97 updated below, April 1998 In this note I attempt to explain how to calculate the deadtime corrections for normalizing the data. (This is a really old note, but perhaps still useful.) I only consider DAQ, i.e. computer, deadtime, which at present is on the order of 700 microseconds {note added in April 98: this is now 300 microseconds} in the best case of our CODA 2.0 implementation. Other deadtimes, e.g. electronic deadtimes (measured to be of order 100 ns) and detector deadtimes (unmeasured as yet but probably less than 1 microsecond) are NOT discussed and are also not very important unless the single arm rate is of order 1 Mhz. We distinguish two problems: 1) How to calculate the overall deadtime, which is a number people on shift try to keep below about 30%, and 2) How to calculate the deadtime correction for a particular trigger type, which of course is mathematically different than 1). The deadtime correction is defined from the probability that a trigger would have been accepted. With scalers we count how many triggers of various types were created, and from the datastream we can count how many trigger types were accepted. The ratio of the sum of triggers accepted to the sum of trigger which tried to trigger the DAQ is the "livetime" ratio (summed over trigger types), and the deadtime is simply 1-livetime. This is what is displayed in the deadtime monitor window. For a particular trigger type, say T5, a deadtime correction must be applied to get the cross section for that trigger type. This is again the ratio of the number of T5 triggers we accepted to the number we tried to accept. The number we tried to accept is the number of T5 measured in the scalers during the run divided by the prescale factor for T5. The number we did accept is counted in an obvious way from the datastream. The subtleties in the deadtime correction formula stem from the way the scalers work and from the way the trigger supervisor works. There are also a potential subtlety in that, depending on how the timing is set up, a trigger "N" could actually be a trigger "M". This would be a mistake, e.g. from having a wrong trigger file, but there is no great danger in such a mistake since all frontend modules are read out for all trigger types, so if you can tell from the event topology what kind of trigger it was, you can analyze it accordingly. The scalers simply count the number of triggers of each type which were produced by the system. Each channel of scaler is independent, and so the triggers are counted independently and with no prescaling. The trigger supervisor (TS) has a prescale function and a memory lookup (MLU) function. When the signals arrive at the input, the prescale function acts FIRST. A prescale factor of N means the trigger supervisor ignores N-1 triggers until the Nth one comes along. The MLU function acts upon the prescaled trigger inputs. In a given time interval of "live" time (i.e. when the DAQ can accept a trigger), the first trigger to arrive may trigger the system, and then it is dead for typically 700 microseconds. However, if a second trigger (or more) signal arrives on a different input of the TS within 10 nsec of the first, an overlap occurs and the TS in its present state of programming creates a trigger 14, seen as "event type 14" in the language of ESPACE. The 10 nsec is a hardware feature of the TS in its current state. Lets give a few simple examples: Example 1) We have two trigger types T1, T2 which are timed to arrive at the TS within 10 nsec of each other, and the prescale factors ware 1 and 10 respectively. Assuming the rate was very low compared to the (1/deadtime), the first 9 trigger types in the data would be T1, and the 10th would be trigger 14 because of the overlap programming. Example 2): We have three trigger types 1,3,5 where T5 is forced to occur 30 nsec before T1 and T3, but (T1,T3) are overlapping within 10 nsec. Lets say that every time a T5 appears at the input (before prescaling), that T1 and T3 also appear at the input (before prescaling). Let us say that the prescale factors for T5 are 1 and for T1 and T3 they are respectively 12 and 100. In this case, we'd always see a T5 and never see T1,T3 -- unless there were additional T1 and T3 which did not belong to a T5. [This example probably sounds familiar from our experimental setup where T5 is formed externally to the TS and is the "and" of T1 and T3.] So, if there were also T1,T3 which were not correlated with T5, we'd see every 1 in 12 T1, and 1 in 100 T3. Example 3): Lets take example 2 and modify it slightly -- let the prescale factor for T5 be 2. Then we'd only see every second T5. But, the T1,T3 overlap with 10 nsec, so we'd start seeing T14. How many would we see ? Since the MLU acts after the prescaling, the T14 rate observed in the datastream would be: (p5-1)*T5 T14 = ------------ * Z * lt p5*p1*p3 where T5 is the rate of T5 (scaler), and p1,p3,p5 are the prescale factors, and Z is the probability to overlap within 10 nsec (Z=1 in this example), and "lt" is the livetime = 1-deadtime. Note that this formula is not correct if the ratio p3/p1 or p1/p3 are an integer. E.g. if p1=10 and p3=100 there will be overlaps for every T3. In that case T14 = (p5-1)*T5*lt / (p5*p3). We point out that the T14 can come from several origins: 1) From the mechanism described in example 3, where T1 and T3 are correlated to T5; 2) From the setup of timing, where inputs arrive within 10 nsec sometimes or all the time. Let us consider the second case. In March of this year, during commissioning, the T1 did come within 10 nsec of T5 with a certain probability Z, which was 1 for pulser setup and probably near 1 for beam. The result was that in the datastream we counted T5 and T14: T14-in-data = [ (Z/p1) * (T5/p5) ] * lt T5-in-data = [ (1 - Z/p1) * (T5/p5) ] * lt (Note if Z=1 and p1=1, we would only see T14) where lt = "livetime" = 1 - deadtime, T5 is the un-prescaled rate (measured with scalers), p1,p5 are prescale factors. These T14 should be analyzed the SAME way as T5. So adding up the number of T14 and T5 seen by the analysis, one gets T14-in-data + T5-in-data = lt * T5 / p5. One can solve for lt, the correction to the cross section. If T3 also overlaps, the formulas are more complicated. In general, I have tried to avoid all these complications by forcing T5 to arrive 30 nsec before T1,T3. This, combined with the fact that p1 and p3 were usually much larger than p5, should make it rare to see T14 from this cause. One still sees T14, however, when p5 is not 1. I would say that if one sees T14 when p5=1, then it is due to the aforementioned overlaps of T1 with T5, or of T3 with T5. Another subtlety: If T1 or T3 came >10 nsec before T5, it can "kill" T5. If these singles triggers were the origin of the T5 they killed, it would potentially be a source of inefficiency, unless you recognize the singles trigger was a coincidence from the data (e.g. coincidence time peak). Let us now consider the deadtime summed over trigger types (which is NOT the same as the deadtime correction of a particular trigger type) and for the case where T5 is split from T1,T3 but T1 and T3 overlap. We also have T2 and T4 which are exclusive of T1,T3,T5, so they normally don't overlap at the TS (at least we'll assume they don't). Then the deadtime summed over trigger types is 1 - [TSout / (x1 + x2 + x3 + x4 + x5 + xoverlap - x1corr - x3corr)] TSout = observed DAQ rate x1 = T1/p1, x2 = T2/p2, x3 = T3/p3, x4 = T4/p4, x5 = T5/p5 T1,T2,T3,T4,T5 = trigger rates (measured with scalers) if the ratios p3/p1 and p1/p3 are not integers: xoverlap = T5*Z*(p5-1)/(p1*p3*p5) = the T14 rate else if one of the ratios is an integer xoverlap = T5*Z*(p5-1)/(p5*max(p1,p3)) = the T14 rate Z = probability of T1 and T3 to overlap (about 1) x1corr = (x5 + xoverlap)/p1 x3corr = (x5 + xoverlap)/p3 The correction factors x1corr and x3corr need explaining. For every T5 there is also a T1,T3, but they are discarded because T5 takes precedence. This discarding occurs in addition to the prescaling. The scaler rates T1,T3 therefore over-count the number of times T1,T3 tried to trigger the system. The probability for T5 to overlap with a Ti (i=1 or 3) which would NOT have been discarded by its own prescale factor pi, is 1/pi. The scaler counts of T1 and T3 also have to be corrected for the T14 overlaps, which leads to the xoverlap component of x1corr, x3corr. We close with a few points to emphasize: 1) If p5=1, and p1,p2,p3,p4 >> p5, the deadtime summed over trigger types is a quite simple formula (which people used for a long time) and there are few T14. This was the case for most of the beam time. 2) It is important to realize that the correction factor for a trigger T5, for example, is NOT the deadtime summed over triggers. It is simply the fraction of T5 which were accepted from the number of T5 which tried to trigger the system. We know how many were accepted from counting them in the data file, and we know how many triggers we had from the scalers. Thus, the livetime of T5: livetime = lt = (T5-in-data)/(T5/p5) As already mentioned, there are subtleties if the T5 are not split to come well before the T1,T3. e.g. if T5 overlaps with one of the other trigger types, leading to T14 (seen when p5 is not 1). I have already explained one such case (T1,T5 overlap) where T14-in-data + T5-in-data = lt * T5 / p5. If T5 is setup to come too late, one may also see T1 or T3 which are actually T5. 3) Paul Ulmer has suggested to have the MLU trigger patterns latched in the datastream. We found out Oct 28, 97 that this was easy to do since the TS has a set of 12 outputs corresponding to the 12 trigger types, so we can put these into a TDC to latch the trigger pattern. So, if a T5 is accompanied by a prescaled T3, we'll see channel 5 and 3 in the TDC. This way we can sort out the prescaled singles triggers, even if they belong to a coincidence trigger. It also simplifies the deadtime formulas. This scheme is implemented as of Oct 28, 97. [The TDC is a model 1877, 5 microsecond range, multihit with max of 6 hits per channel; the signals are at about 1/2 range, i.e channel 2500. Since a channel can have up to 6 hits per event, the true hit should always be visible. At present, the TDC is "spare7 detector" in the detector map. Channels 1-12 of spare7 are the triggers 1-12. The triggers arrive at the TDC AFTER prescaling.] 4) Although I'm not going to rewrite this page, I will offer my lone update for 1999: We have two new ways to compute deadtime from scalers. Two scaler channels count signals gated by DAQ "not busy" pulse. The signals are Q10 and clock. Q10 is the charge signal from V-to-F and clock is the 1024 Hz clock. Since these signals are scaled in other channels, gated by the run, the ratio to these new channels should give the deadtime correction. It is worth trying early in the experiment. --------- Checks on Deadtime and Other Problems ------------ Since writing the above epic tome, there have been a few developments. We added in the scalers a channel that counts the random T5. The electron arm signal was delayed several microseconds (I think its about 3 microseconds) and AND'd with the hadron arm signal, and this stuffed into a scaler. Another development was that the E89003 experiment has observed for several runs that their deadtime correction for T5 is much larger than that for T1 or T3, and is typically 40% even at fairly low rates. The effect is not yet understood. The danger this poses to normalizing an experiment illustrates why it is very important for experiments to check the following during an experiment. If all this works out, you probably can calculate the deadtime correctly. 1. Scalers Working ? Check that there are no large asymmetries in rates, apart from what might be expected. Check that the sum of rates on 6 scintillator paddles is approximately the trigger rate, and the S1 and S2 plane have about the same rates (within factors expected by geometry). Not only the online scaler display but the offline record of scalers should be checked. 2. Cross Section ? A rough 20% check of cross section should be done using scaler information and charge. 3. Deadtimes the same for every event type ? The % deadtime for T1,T3, and T5 should be the same. This can be checked by counting how many of each event type appear in the CODA file and how many are expected from scalers and prescale factors. Be careful -- CODA writes to a data file but when you read this there are many ways to artificially reject events and miscount. For example, the "missing gate" and "extra gate" events are rejected by ESPACE before any histograms are incremented. Look at the raw data ! 4. The TDC that latches trigger pattern makes sense ? The TDC trigger pattern (note 3, Paul Ulmer's idea, see above) could be used to explicitly sort the events that belong to two trigger types (e.g. a T5 accompanied by a T3). and calculate deadtimes directly because you know precisely how many of each event type was accepted and there are no ambiguities from overlaps. Also, the event type should be correlated in a sensible way to the trigger pattern. For most events the event type will simply equal the pattern number. But there may also be triggers 1 or 3 which accompany trigger 5, for example. Some combinations are illegal, for example an event type 2 should not be accompanied by a trigger 1, since they are exclusive. But be careful -- the TDC that latches this pattern has a 5 microsecond window. There are random overlaps. Its necessary to look at the TDC distribution, recognize the region of real events and apply a cut. 5. Sum rule for TS-out The scaler channel counting accepted triggers should equal the sum of event types 1-12, 14 in the CODA file (in practice there are only types 1-5, 8, and 14 at present). 6. Check of randoms A useful check that T1, T3, and T5 make sense is to compute the random T5 rates as many ways as possible. Examples of things that can (and have) gone wrong: a) An electronics channel is wrongly terminated, the reflections cause double-counting; b) PMT's double pulse with some probability; c) A piece of electronics fails affecting one rate but not the others. There are presently 3 ways to check the randoms. They should agree ! a) Calculate them from T1 and T3 rates b) Calculate them from T5 rates and the TC spectrum. You simply correct T5 for the ratio of trues to accidentals. c) There is a scaler channel that counts the random T5 directly. The E-arm signal and H-arm signal are AND'd, and the E-arm signal has a ~3 microsec delay, so the AND's are random. ------- Causes of Deadtime ---------------------------- Here are some causes of DAQ deadtime. I'll classify them as "under control" and "hard to predict". The "under control" deadtimes can probably be measured and understood with scope measurements. The "hard to predict" deadtimes come and go and may be impossible to predict from features of your data such as event size. (See also the FAQ in dtime_faq.html.) I. Deadtimes which are under control 1. Fastbus readout (fortunately the dominant one with CODA 2.0) Can be measured on a scope, was of order 1 - 1.4 msec for CODA 1.4 (my dim recollection). Will depend on event size. With the 5 microsec (presently 1.5) window on TDC-1877, there will be an event-size effect at 10's of kHz. Will be different for different crates and event types. II. Deadtimes which are hard to predict 1. Network This is partly under control, when the network is behaving, because it should be measurable. The unpredictability comes from the fact that it was a competing deadtime for CODA 1.4 and it was sometimes bad, but not known why. This is much less a problem with CODA 2.0 setup. 2. Disk I/O When reading rapidly from the same partition to which we are writing, the disk head jumps around. It can cause a significant deadtime. (We avoid this now.) 3. Online Analyzer / Event Builder A main possibility is to have too many processes on the workstation which causes swapping of event builder from memory. We've gotten better at avoiding these problems. 4. Various Wrongly Set Parameters in CODA This did not affect E89003, but it happened recently. -------- Monte Carlo by M. Liang --------- A Monte Carlo was written by M. Liang for the E89003 experiment which calculates the deadtime corrections. See http://www.jlab.org/~mliang/deadtime_cor/ and ask Meme for details.

This page maintained by rom@jlab.org