Info About Work and Cache Disks in Hall A

Robert Michaels, rom@jlab.org, Jefferson Lab Hall A, updated Aug, 2001

Firstly, general info about JLab computers (e.g. what is a work or cache disk ?) can be learned from the computer center. This page pertains to the management of Hall A work and cache disks.

WORK DISK on JLab CUE

We have 1.25 Terabytes of scratch work disk on JLab's Common Unix Environment (CUE). There are various physical partitions: 12 are 50 Gbytes and 1 is 350 Gbyte, and 1 is 300 Gbyte. Typically the 50 Gb partitions are dedicated to "contemporary experiments"; the two larger ones are for everyone (including contemporary experiments). Contemporary experiments include the one running, the one or two upcoming, and recent previous experiments.

To get to your work area, contemporary or not, follow the link: For the example of e98108, you "cd /work/halla/e98108" on JLab computers. If you are in the Unix group a-e98108 you can write there. You must be in the Unix group corresponding to your experiment. Ask me to be added to a group or for other questions. Some experiments have two disk areas, like /work/halla/e97103/disk1 and /work/halla/e97103/disk2, usually one of these is the "dedicated" disk, the other points to a region on a big "shared" disk.

A "cleanup" script will maintain free space on the work disks. In simple terms, the largest and oldest files are deleted nightly, if necessary, to keep some space free. For the dedicated disks, the spokespersons can elect to have me turn off the cleanup script; but it will run on the big disk. The cleanup script is described in the appendix below.

Another "cleanup" procedure, in addition to the usual nightly one, will be to delete very old unused files. The criteria is that the modification date is more than 3 years old and the access date is more than 1 year old. This will be done unannounced every few months.

IMPORTANT: Work disks have no backups. Leave only things on /work that you don't mind losing. Do not confuse /work, which is scratch space, with the central /home fileserver.

When an experiment loses its disk, its data will be squeezed into one of the big general purpose disks, first running "cleanup" with strict parameters to make space. Afterwards you may still follow the link to get to your data.

ADAQ WORK DISKS

Thanks to Ole Hansen, we also have several hundred GBytes of work disk on the adaq Linux cluster (adaql machines, l = linux). Disks are /adaqlN/workM where N = 1,2... and M = 1,2,3... The running experiment may keep stuff there, e.g. huge hbook files, etc. But these disks are not backed up. Also, a cleanup script operates there (see appendix). At the end of the experiment, the adaq work disks are cleaned up (files erased) to make room for the next experiment.

CACHE DISK

In addition to /work, there are two kinds of /cache disk. One kind is ``hidden'' from users and is used to feed the batch farm from data in MSS. Another kind is open to users and is the temporary repository for MSS data. We presently have 1 Tbyte of the latter kind, spread among /cache/halla/EXPERIMENT where EXPERIMENT corresponds to different experiments. See Computer Center Scicomp pages.

APPENDIX -- Cleanup scripts

For WORK Disks on JLab CUE -- Every night the disks are checked. If less than 95% full, nothing happens. If greater than 95% full, files will be deleted as follows: Initially, files greater than NB bytes are considered, then NB/(10**i), i=1,2,3...(loop) where NB is big. I.e., first we consider 10 Gbyte files, then 1 Gbyte, then 100 Mbyte, etc. At each level of filesize, files are sorted by last usage; oldest files considered first. A file is deleted only if it has not been used within NDAYS (= 7). After each deletion the disk usage is checked; when it falls below 90% the deletion stops. This script can be disabled for the dedicated disks at request of spokesperson, but cannot be disabled for the general (shared) disks.

For CACHE disks /cache/halla (experiment and home directories) -- Every N hours each disk is checked. Independently of how full the disks are, usage is checked versus quotas. Typical quotas are 12% for ``big'' experiments and 5% for ``small''. If the directory is exceeding its quota, files are marked for early deletion using "jcache -d", according to the following decision method. (First note, I do not actually delete files. They only get marked as candidates for deletion, and will be deleted if the disks get full.) Initially, files greater than NB bytes are considered, then NB/(2**i), i=1,2,3...(loop) where NB is big. I.e. first we consider >10 Gbyte files, then 5 Gbyte, then 2.5 Gbyte, etc. At each level of filesize, files are sorted by last usage; oldest files considered first. A file is marked for deletion only if it has not been used within NDAYS (typically NDAYS = 3). After each "jcache -d" the directory's usage is decremented and checked versus quota. When the usage falls within quota, the marking for deletion stops.

For ADAQ WORK DISKS in counting room. The following algorithm will maintain space on the ADAQ Linux scratch disks /adaqlN/dataM (N=1,2..) (M=1,2..). Every night each disk is checked. If less than 95% full, nothing happens. If greater than 95% full, files will be deleted as follows: Initially, files greater than NB bytes are considered, then NB/(10**i), i=1,2,3...(loop) where NB is big. I.e., first we consider 10 Gbyte files, then 1 Gbyte, then 100 Mbyte, etc. At each level of filesize, files are sorted by last usage; oldest files considered first. A file is deleted ONLY IF it has not been used within NDAYS (= 14). After each deletion the disk usage is checked; when it falls below 80% the deletion stops. To prevent infinite looping, the script quits trying after a number of files are considered; e.g. if the users "touch" all files within NDAYS, nothing gets deleted. The sorting by filesize tends to avoid removing source files. To further avoid this (and to avoid deleting "data") a file is not deleted if it contains a key word like ".dat" or ".cxx" etc. At then end of an experiment anything might be deleted to make room for the next experiment.

 

Usage on Assigned Hall A Work Disks
===================================
Snapshot on Sept 12, 2001

 Mount            /w mount    Size   % used  Links /work/halla/..           
fs3-bb:/fs3/work1 /w/work301  363 Gb   97%   various experiments 
fs3-bb:/fs3/work2 /w/work302   52 Gb   78%   e99117 
fs3-bb:/fs3/work3 /w/work303   52 Gb   94%   e98108 
fs5-bb:/fs5/work1 /w/work501  307 Gb    8%   various experiments
fs5-bb:/fs5/work2 /w/work502   51 Gb   12%   e97103
fs5-bb:/fs5/work3 /w/work503   51 Gb    1%   e00102
fs5-bb:/fs5/work4 /w/work504   51 Gb    1%   e99114
fs5-bb:/fs5/work6 /w/work506   51 Gb   84%   e94103/disk2 
fs5-bb:/fs5/work7 /w/work507   52 Gb   97%   ndelta/disk2 
fs5-bb:/fs5/work8 /w/work508   52 Gb   67%   e97111
fs5-bb:/fs5/work9 /w/work509   52 Gb   98%   e99007

R. Michaels -- e-mail: rom@jlab.org

Info About Work and Cache Disks in Hall A

WORK DISK on JLab CUE

ADAQ WORK DISKS

CACHE DISK