ESPACE and ntuples
     _________________________________________________________________
   
     * To: halla_espace@jlab.org
     * Subject: ESPACE and ntuples
     * From: Ole Hansen 
     * Date: Tue, 17 Jul 2001 17:07:40 -0400
     * Reply-To: Ole Hansen 
     * Sender: owner-halla_espace@jlab.org
     _________________________________________________________________
   
Dear friends of ESPACE,

here is some hopefully useful information on how to write ntuples
with ESPACE.  If you are planning to produce ntuples, please read
this message thoroughly.  This is especially true if you are planning
to use the new 2.9.2 version of ESPACE.

Best regards,
Ole

BASIC PROCEDURE
~~~~~~~~~~~~~~~
The basic procedure for creating ntuples is as follows:

  set/file/output file.hbook           <- Open output file FIRST!
  set/ntuple on                        <- Switch to ntuple mode
  spectra/save var1/var2/var3 ...      <- Define ntuple

  file/scan rawdata.dat                <- Analyze some data

  spectra/write                        <- Write all spectra to HBOOK file

This will create an HBOOK output file containing one ntuple with the
columns "var1", "var2", "var3", etc.  The variables must be defined in

   espace_lib/block_data_var.f           and
   espace_halla/variable_user_init.f

Variable names are usually prefixed by spectrometer and detector names,
e.g. spec_e.s1.adc_l

It is very important to open the output file BEFORE defining any ntuples
so that all ntuples become "disk-resident". Otherwise, you will run
out of memory very quickly when analyzing many events.

As of ESPACE 2.9.2, ntuples are created in column-wise format, which
is far more efficient than the previously-used row-wise format.

Also as of ESPACE 2.9.2, any dots in the variable names are replace by
underscores to prevent confusion in PAW.  For example, spec_e.s1.adc_l
becomes spec_e_s1_adc_l.  This replacement is done only for ntuples,
NOT for histograms.

There are several limitations to defining ntuples via ESPACE macros
and the COOLHANDS package:

- The number of variables in an ntuple must be 64 or less; and
- the length of the "spectra/save" command line cannot exceed 1023 characters.

(The latter is usually the more severe restriction).
Also, ntuples must contain at least two variables because an ntuple
definition is recognized by the "/" character in the spectra/save command.

Multiple ntuples can be defined via repeated "spectra/save" commands.
Ntuples and histograms can be freely mixed.  Before defining two-dimensional
histograms, however, make sure to switch to histogram mode with
"set/ntuple off". For example:

   set/file/output file.hbook <- Open output file FIRST!
   set/ntuple on              <- Start defining ntuples
   spectra/save var1/var2     <- This defines an ntuple with 2 columns

   set/ntuple off             <- Start defining histograms
   spectra/save var1/var2     <- This defines a 2d histogram

   set/ntuple on               <- Start defining ntuples again
   spectra/save var3/var4/...  <- define another ntuple

   file/scan ...               <- Analyze data, fill spectra

   spectra/write               <- write the two ntuples and one 2d histogram
                                  defined above to file.hbook


ARRAYS & MULTIHIT QUANTITIES
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In principle, arrays can be written to ntuples without any special ado.
Similarly, variables that have several entries per event (e.g. VDC wire
number) can be included in ntuples without loss of information.

In doing so, however, it is important to ensure that array index ranges are
consistent within the same ntuple; otherwise unexpected (and undesired)
truncations may occur.  Example:

   set/ntuple on
   spectra/save spec_e.s1.adc_l/spec_e.s1.adc_r/spec_e.s1.tdc_r/spec_e.s1.tdc_l

This is problematic because each of the four variables in this ntuple has
a different index range, viz.  (see espace_lib/index_range.f):

   spdetector.sp(ispectro).trigger.sc(ip_sc).nhits.adc.left
   spdetector.sp(ispectro).trigger.sc(ip_sc).nhits.adc.right
   spdetector.sp(ispectro).trigger.sc(ip_sc).nhits.tdc.left
   spdetector.sp(ispectro).trigger.sc(ip_sc).nhits.tdc.right

For instance, 2 paddles on this scintillator's right side and only 1
paddle on its left side might have had a hit for a certain event. Of the
two paddles on the left side, only one has a good TDC signal. Therefore,
the index ranges would be 2,1,1,1.

In such a case, COOLHANDS will simply truncate the index range entered into
the ntuple (or histogram!) to the smallest one consistent with all the
variables. In the case above, only one entry would be made into the ntuple,
even though two values exist for the first variable.

The solution is to have only those variables in the same ntuple whose
index ranges are compatible. This means that you often have to create
several ntuples.  For example, the case above is properly handled
like this:

  spectra/save spec_e.s1.adc_l/spec_e.s1.adc_l_c
  spectra/save spec_e.s1.adc_r/spec_e.s1.adc_r_c
  spectra/save spec_e.s1.tdc_l/spec_e.s1.tdc_l_c
  spectra/save spec_e.s1.tdc_r/spec_e.s1.tdc_r_c

Note that it is necessary to put a second, compatible variable in the
ntuple to be able to define the ntuple at all.

Using the above definitions, one can write all available information to
the output file. However, since the information gets distributed over
several ntuples, the data become hard to analyze. You probably need special
analysis code other than PAW to do this properly.

To find out the index range for each ESPACE variable, you must inspect
the routines

   espace_lib/index_range.f        and
   espace_halla/index_range_user.f

manually.

Fortunately, many variables are scalars, which can be put in a single
ntuple without restrictions.

In addition to implicit index ranges, it is possible to use explicit array
indices with variables in ntuples.  For instance

  spectra/save spec_e.s2.tdc_r[2;3]/spec_e.s2.tdc_r_c[2;3]

makes an ntuple of S2 TDC values for paddles 2 and 3 only, and

  spectra/save spec_e.s2.tdc_r[3;3]/spec_e.s2.tdc_r_c[3;3]

does the same, but only for paddle 3.

To allow easy identification of array and multihit quantities,
ESPACE, as of version 2.9.2, automatically adds the following
three columns to each ntuple created:

   Event
   Index
   Subdet

"Event" is the event number as provided by CODA (variable nr_interrupts in
espace_halla/rawdata_til_eof.f)

"Index" is the hit number. It runs over the range returned by
espace_lib/index_range.f and espace_halla/index_range_user.f.
This is typically the number of hits in a wire chamber plane, the
number of scintillator paddles fired, or the number of shower blocks
activated.

"Subdet" is the "subdetector number" as returned by espace_lib/det_sub.f
or espace_halla/det_sub_user.f. This is typically the wire number,
paddle number, shower block number, or similar.

An explicit array index given in the ntuple (or histogram) definition, e.g.

  spectra/save spec_e.s2.tdc_r[2;3]/spec_e.s2.tdc_r_c[2;3]

always restricts the value of "Subdet". In this case, 2 <= Subdet <= 3.
Since, with the new variables in the ntuple, it is easy to make
such cuts when inspecting the ntuple with PAW, it is not normally
necessary to specify array indices explicitly.

Incidentally, a definition like

  spectra/save spec_e.s2.tdc_r[2;3]/spec_e.s2.tdc_r_c[4;5]

will produce an empty ntuple since the subdetector ranges of all
variables are never simultaneously satisfied. Likewise

  spectra/save spec_e.s2.tdc_r[1;3]/spec_e.s2.tdc_r_c[3;6]

will contain Subdet = 3 only.

As with hit indices, it is equally important not to mix quantities in
an ntuple whose subdetector ranges are incompatible.  For instance,
one should not mix scintillator and shower counter TDC data in a
single ntuple.  If this is done, the value of "Subdet" is undefined,
and only a subset of the available data will be entered in the ntuple,
usually without any clear relationship between the different columns
of each row.

FILE FORMAT
~~~~~~~~~~~
As of version 2.9.2, ESPACE output HBOOK files are written with
a record length of 4096 instead of 1024.  This requires that the
files be opened with a command like

  his/file 1 file.hbook 0 x

where the 0 indicates that the record length should be determined
automatically.  Of course,

  his/file 1 file.hbook 4096

will also work, but only for new files, not for old ones.

The PAW++ browser adds the "0 x" parameters automatically, so nothing special
needs to be done.

New HBOOK files are written in the "new" RZ format, which does not
have a limit of 64k records.  This change is transparent to the user -
PAW and HBOOK automatically detect the new format.  CERNLIB >= 94B is
required to read the new files, which should not normally be a problem
nowadays, unless you are still working on Ultrix or an ancient HP box
perhaps.

As of ESPACE 2.9.2, the maximum file size of output HBOOK files has
been increased to 1GB.  Prior versions had a limit of 125MB, which was
reached often when trying to write large ntuples.  The 1GB limit can be
further increased to the file system limit (usually 2GB) by further
increasing the record length in espace_lib/open_file.f.

LIMITATIONS
~~~~~~~~~~~
The discussions of arrays above reveals that it is essentially
impossible to put a complete description of events into a single,
easy-to-analyze ntuple produced by ESPACE.  This is partly a
limitation of HBOOK, but more so one of COOLHANDS.  HBOOK supports
variable-length array columns in column-wise ntuples, which could be
used to include multihit and array quantities, but there is no way to
define such entries with the current version of COOLHANDS.

Although COOLHANDS could be improved to add full ntuple support, we are not
planning to do so, but instead will provide this capability in the new
C++ analyzer.

It is still possible to get complete event information from ESPACE,
however the data will in general be split between a number of different
ntuples in the same file.  As a result, a complete analysis of such data
usually requires custom analysis code outside of PAW.

Maintained by Ole Hansen (ole@jlab.org)
Last modified: Fri Sep 20 22:33:41 EDT 2002