Douglas P. Kingston III
Ballistics Research Laboratory
Michael John Muuss
Ballistics Modelling Division
Ballistics Research Laboratory
Published in The Proceedings of the 1982 Summer USENIX Conference, Boston.
(The full source and documentation is available via anonymous FTP at ftp://ftp.arl.mil/arch/mdqs.tar.Z )
The Multiple Device Queueing System (MDQS) is designed to provide UNIX with a full function, modular, and consistent queueing system. The MDQS system has been designed with portability, expandability, robustness, and data integrity as key goals. MDQS is designed around a central queue which is managed by a single privileged daemon. Requests, delayed or immediate, are queued by non-privileged programs. Once queued, requests can be listed, modified or deleted. When the requested device or job stream becomes available, the daemon executes an appropriate server process to handle the request. Once activated, the request can still be canceled or restarted if needed. MDQS can serve as a delayed- execution/batch subsystem and replaces internally the functions of lpr(I) and at(I) as a minimum. MDQS provides the system manager with a number of tools for managing the queueing system. Queues can be created, modified, or deleted without the loss of requests. MDQS recognizes and supports both multiple devices per queue and multiple queues per device by mapping input for a logical device to an appropriate physical output device. Anticipating the inevitable, MDQS also provides for crash recovery.
The MDQS system has been developed at the U.S. Army, Ballistics Research Laboratory by Doug Kingston and Michael Muuss to support the work of the laboratory and is available to other UNIX sites upon request. The MDQS system is designed to be compilable on a standard V7 UNIX system.
* Unix is a trademark of Bell Laboratories.
Traditionally UNIX has been a small system aimed at experienced computer users. The early UNIX queuing systems followed the basic tools concept found throughout UNIX and provided only basic queuing services without any frills. Since then, UNIX has become a widely accepted system servicing a wide range of users and applications. With wider use has come greater demands on the software. The queuing software has been asked to provide the same kinds of services found on "larger" systems. In general, these requests have not been able to be satisfied with the currently released UNIX queuers. The authors, both researchers at the Ballistics Research Laboratory (BRL), have undertaken to develop a general purpose queuing system for UNIX as part of the continuing UNIX development project at BRL.
The Multiple Device Queueing System (MDQS) was designed to fill in the void in the released versions of UNIX caused by the lack of an adequate queuing system. The MDQS is collection of programs that share a common queue management system. There are "enqueuers" that place items in the queue. These are normally programs like lpr(1) and at(1), but could also be network servers accepting requests from another machine. The "dequeuers" are normally the programs that actually handle output onto the device, but if the device is not resident on the machine, the dequeuer is a network transfer program which will connect to an appropriate server process on a remote machine. The dequeuers are analogous to the UNIX programs lpd(8), and atrun(8). Queue management programs include the daemon, a status program, and a queue modification program.
Lack of a "full function" queuing system for UNIX was a problem: we wanted a number of features that are common to any queued request to be handled by the same mechanism in all cases. These include specification of start time, notice of completion, prioritization, and output limits. We needed the capability for users to list, modify, and delete the queue entries if necessary. The organization of the queues needed to allow for flexible queue administration. It should be possible to have more than one device servicing a single queue and likewise more than one queue should be able to feed a single device. The pairing of devices and queues in MDQS is table driven so that it can be reconfigured by simply editing a file which MDQS reads at runtime (not at compile time!).
The basic features of all queue requests include several which have been missing from other queuing systems. All requests are assigned a start time. If the start time is in the future, that request is delayed until that time for enabling. Every request is also assigned a priority. If two requests require the same resource, the one with the higher priority is run first, or if both have the same priority, the request with the earlier start time is run first. A request can be flagged indicating that the initiator of the request wants notification when the request has been processed. This notification is designed to work across network connections. The request has associated with it a forms requirement in recognition that device do not always have the materials we expect in them. Requests can stipulate limits on their output to avoid runaway outputs. Lastly, one can request multiple copies of the output without submitting the source file multiple times.
The heart of the queuing system is the daemon. The MDQS daemon periodically searches the queue for additions, deletions, or modifications. The daemon keeps several internal queues of requests. A delayed request queue holds all requests whose start time lies sometime in the future. The ready request queue's contains all requests that are ready to be processed but which are awaiting a free device. There is one ready request queue for each logical device that is supported. This logical device queue may be serviced by more than one physical device on a heavily loaded system. On another system, a "printer/plotter" type device may be servicing both a plot queue and a lineprinter queue.
Since users often make mistakes, the queuing system had to have provisions for monitoring and modification by the user community as well as control by the operations staff. The queue status program allows the user to get detailed information about his own requests and general information about other jobs in the queues. A queue modification program allows the user to change the priority, start time, limits and flags of his queued request, or he can delete it altogether.
Every job queued gets a "sequence" number which is unique for his UID and can be used to reference that job for modification, or deletion. In the global sense, every job is identified by the UID and "sequence" number combination, but for the individual user, the sequence number will usually be a small integer. Increasing sequences of small integers on a per user basis is both easier to think about and will map more directly to order actions in the users mind.
Finally, the queuing system is designed to recover from crashes or program failures by verifying each step as it is taken. For instance, active queue files are not deleted until the request has been completed by the MDQS to the best its ability. After a crash, the daemon will try and restart the job that was running at the time of the crash. Where possible, duplication of output is eliminated by noting which portions of a job have been completed.
There are several factors behind the design of the MDQS. The primary design criteria is that the system should be modular. We have the need to queue output to a large number of different devices on different machines. The MDQS will allow requests to be queued on a machine regardless of whether the actual device resides there. We wanted it to be possible to add and delete active queues as the status of hardware and system configurations changed. Lastly, we didn't want to have a different queuing system to maintain for each device. This means less work for the systems staff, and one common interface for the user community to learn. All configuration dependent information is read in at runtime by those MDQS programs that need to know, enabling us to have transportable binaries among homogeneous machines which greatly simplifies queue management.
An example of the use of the MDQS will be helpful is showing some of the capabilities of the system. A typical user might start the day by submitting 3 requests to the print queue to be printed on 8x11 paper. He later finds out that the computer room is out of 8x11 paper. One of his printouts is needed for a meeting, so he decides to print it on 15x11 instead. He does a queue status command to find his print requests ids. Using the queue modify command, he changes the forms on his second request to 15x11 and asks that he be notified when the request is completed. Twenty minutes later he receives a electronic message from the queue daemon announcing that his second print request has completed.
Later that day, the user queues 2 "batch" jobs to be run after midnight and again does a queue status command. The who 8x11 prints (id=1,3) are still there, and the new batch jobs (id=4,5) are now also shown. Finally, before he leaves for the night, he decides to start running the first batch job with the hope that it will be done by the time he is finished with dinner. Using the qmodify command he changes the start time of the job to the current time. That evening he logs on to get his mail and finds his first job has completed successfully.
The operators for the system could also issue the commands demonstrated above but if they wanted to access other than their own requests, they would have to specify the user as well as the request id. Operators also need to be able to inform MDQS of changes in the queuing system (e.g. changes in paper type, enabling of a second printer). To make these changes, the operator just edits the MDQS configuration file. The queue daemon detects the changes and modifies its internal tables accordingly.
The queuing system uses a single set of queuing directories. Figure 1 shows the directory hierarchy that is currently used. The sequence of events require that the queuing be done in two steps by two processes.
Another area of strong concern is the security and integrity of the data and accounting information. In order to preserve the integrity of the user's data, all MDQS processes run with the UID and GID of the user when accessing his files, including those files in the queue directory. The queue directories are generally writable to allow "enqueuers" running as a user to create files, but the queue directories are protected by a "lock" directory which is only entered by programs which are setuid to the MDQS UID. Those programs fall under the the category of "trusted programs" and are maintained by the systems staff.
One of the common places for security problems on all computer systems is in the area of "queuing systems", and we have had our on share of headaches with the old lineprinter spooler. The approach taken with MDQS was to always access data unprivileged. In addition, since the dequeuing process will be running with the user's real uid and gid, it will be possible to print protected files without copying them.
The enqueuing process currently consists of two steps which are normally separated into two processes. The first process runs unprivileged and creates uniquely named files in the qtmp directory. One of these, the control file, will contain the names of any other queued files. When all the files have been created, the first process calls a program that is setuid to the owner of the queue directories. This program is passed the name of the control file as an argument. The process is able to traverse the qlock directory, and after chdir'ing to the qhome directory, returns his effective UID and GID to the user making the request so that he can modify the the control file. All the data files are linked into the qdata directory, and finally the control file is linked into the qcontrol directory. Throughout this enqueuing, all the files remain readable and writable only to the user making the queue request.
The BRL/JHU accounting systems relies on the ability of a parent process to inherit the accounting information of a child process. Orphaned children (processes whose parents die before they do) are inherited by INIT (process id 1). INIT collects the accounting information and updates the accounting records. In order to properly account for the costs of processing a queue request, we could not have the daemon inheriting the dead "dequeuer" processes. This would have charged all queue processing to the MDQS account.
The need to doubly detach the servicing process causes a problem for the daemon since it now more difficult to monitor the status of that program. The strategy to handle the "dequeuing" process was as follows. A request is chosen from the list of ready requests. The daemon forks and the child, referred to as the file-control process, sets up the input file descriptor from the control file, and the output file descriptor to the device. The filecontrol process sets real and effective user id's to the person who made the request. Finally, a pipe is created which will allow the filecontrol process to get the return code without doing a wait(). The filecontrol process then forks and executes the appropriate servicing process. When the servicing process is done, he writes an exit code onto the pipe to the filecontrol process and exits. The filecontrol process reads this result and then exits with this return code without doing a wait(). The result is that the daemon gets the return code but without inheriting the accounting. The service process is now inherited by INIT who logs the time and materials used.
As of 22 June, the current implementation of the MDQS consists of the daemon, a qstatus program, a lineprinter queuer and a lineprinter dequeuer. In development are a network server for the lineprinter queue, a "batch job" queuer and dequeuer, a typesetting queuer (local and network) and dequeuer, and the qmodify command.
MDQS does use locking code to help preserve the integrity of the queue and prevent collisions of competing servers. We have implemented the RAND exclusive open code and have been quite pleased (its upwards compatible!). Since others may not have this available, or may have a better mechanism, the locking code has been isolated to one module which can be replaced as needed.