Most system administrators work in an environment where they have
multiple ongoing tasks. In some cases, a system administrator might
have several hundred tasks in their list of "Things to do". These
tasks might range from the "In your copious spare time, please do this"
type of request, to "This project needs to be done this week" type of request.
Some projects are too big for a single ticket, and might need to be
broken into several other tickets. One of those tasks might be
"Install a new dialin mechanism for the users.)
paper discusses the software that
I have developed to help me with all these issues. Several user
patterns and some long term trends and their implications
are also discussed.
TTS is a Perl, Bourne shell, and mail based trouble tracking system
designed for use by System Administrators. It is different from bug
tracking systems and general purpose time tracking systems, in that
there are special needs for system administrators that are not met
by those other products.
Trouble tracking systems are used for many purposes by users,
administrators, and management. RFC 1297  lists eight uses
for a trouble tracking system which seems to encompass the
uses I have seen. I refer the reader to the RFC for more
details on these uses, but present a synopsis here for those
unfamiliar with this RFC. These uses are:
1) Short-term memory and communication. 2) Scheduling and work assignment. 3) Referrals and dispatching. 4) Alarm clock. 5) Oversight by engineers and customer/site representatives. 6) Statistical analysis. 7) Filtering current alerts. 8) Accountability (CYA).
TTS has been used at sites with 22 users to 60 users. It is
currently in use at Objectivity in Mountain View, California,
and at KnowledgeSet Corporation, also in Mountain View, California.
The system has been tested with over 600 open tickets, and the
response time has been acceptable. It is expected that the system
can handle over 1000 open tickets without serious response time
delays, and can handle over 30000 closed tickets without an
unreasonable delay. (Depending upon your local hardware, network,
Version 0 was an electronic file that was edited with a text editor.
This file, (while crude) contained the basic ideas of a trouble
tracking system. In particular, it detailed (in 80 column
format, no less), the requester, the request date, the problem, and
the estimated time it would take to do the task. It also had an
update history attached to each "Ticket". This was an indented
line underneath each ticket with a date an action accomplished
on that date.
A primitive report generator was also part of V0, in that I also had a script which would take the estimated time to complete each task, and would total it. This was sufficient enough to show management that they had given me 26 weeks of work to do. Shortly thereafter, I was given a hiring requisition for another administrator.
Version 1 (V1) was implemented the week after LISA 8 in San Diego
California. This was the conference where Req was introduced, and
it seemed that half the participants were talking about trouble
tracking systems. I decided that I should probably implement
something a little more official, and with a few more features
than the text files that we were currently using.
Version 2 (V2) was started in May of 1995, with the goal of making
Version 1 more portable, and maybe of use to other sites. Significant
enhancements were added to meet this goal. In particular, the "config.perl"
script was created to change site-specific variables in all of the
existing scripts. HTML enhancements were added after September of 1996
in order to let non-Unix users access the database. And most importantly
the transition to a single file per ticket was started.
The concept of an "Administrator" comment was also added after LISA 9
(September of 1995, Monterey).
This was added directly because a certain administrator
(who shall remain nameless) mentioned that she thought the idea of
a field that could be seen only by administrators was a good idea. This
was so that an administrator could safely add the comment "This user is acting
like a rabid wildebeast, be careful!". This would warn any other
administrators that they should be careful, without letting non-administrators
know that they were acting irrationally. (Of course, none of my
users ever acted like a "rabid Wildebeast". Maybe a rabid pit-bull, but
never a rabid Wildebeast.
Version 3 was started in February of 1996, in a concurrent development
effort with version 2.85.
While bug fixes
and some minor enhancements were made to Version 2, the majority of
new development work was done to Version 3.0.
Version 3 was started after a review of
features of competing products was undertaken, and several interesting
features in these products were noted. These ideas came mainly from
"Req, version 1.XX", and "Gnats, Version X.XX".
The main difference between V3, and prior releases was that the idea
of a single file containing all the trouble tickets was finally
abandoned. While some residual code exists for the earlier paradigm,
it is no longer supported as such, and will eventually be phased out
in favor of a faster interface.
In the tradition of most quick hacks, an official list of system
requirements was not done until late in the project. While many
of the requirements were known at the start of the project, they
were not codified until the project was almost completed. The effect
of this was that there were several features that were not added
early in the project while it would have been easy. There were
also several dead ends that were reached in the growth of TTS that
could have been avoided had a real requirements analysis been
done at the start. On the other hand, if a real requirements
analysis been attempted at the start, it might never have even
What follows is the list of system requirements that developed
over the course of the project. Of note are the tickets marked
with an "*", as those are the requirements that were not among
the first, unwritten requirements.
-Easy for all computer users (Mac, Unix, PC, etc) to submit a trouble ticket.
-Easy for all computer users to view the current status of their trouble tickets.
-The ability to assign a priority level to each ticket.
-The ability to assign tickets to specific administrators.
-Administrators will be using Unix workstations.
-Administrators are comfortable with using text editors.
-There should not be a single specific interface to the trouble tickets.
Administrators should have their pick of tools to use.
"Hardware and Software platforms"
Support for users on non-unix workstations (VMS, PC-Dos, PC-Windows, NT, Macintosh, Next workstations, etc.)
Support for administrators on multiple flavors of Unix. (SunOS, AIX, Ultrix, HPUX, etc.) Not all of these may support a working network file locking mechanism.
-It should be usable across several machines of different architectures simultaneously.
-It should not require an expensive back-end database.
-Usable by multiple administrators simultaneously.(*)
Users care more about open tickets, than closed tickets.
-The ability to track how much time any one administrator has spent on any ticket, or on a select set of tickets.(*)
-Multiple report formats, including selecting tickets that are open or
closed, selecting by multiple field selection criteria, and with
multiple management oriented reports (Time spent per admin per week,
length of time in queue, average time used per ticket, etc.)
The basic security paradigm has three rules, 1) Any user can see
any ticket in the system, 2) We trust our users,
and 3) We need to help keep honest users honest.
It was decided early on that any determined individual would be
able to read, and possibly edit trouble tickets. This is an
unavoidable consequence of the design need of maximum visibility
to users of their trouble tickets. No major effort
has yet been expended to add security above the basic paradigm.
If security is needed above this paradigm, then direct user access to
the ticket database can and should be limited (With the corresponding
decrease in usability by administrators and users).
This security paradigm has had several effects upon the functionality
of TTS. Most notable is the ticket submission method. TTS
allows any user to submit a ticket, through either an HTML
interface, or through email. The alternative is to have
the system administrator (Or a set of people at the "help"
desk enter in each ticket. As this was initially designed for a small
work environment where the systems group may consist of a single
over-worked administrator, this was unacceptable. Because of this,
it was decided that an email interface to submit and view tickets
was acceptable. An email method to modify or close tickets
was rejected as pushing the envelope of trusting users. As such
the most damage a user could do is to flood the system with
bogus tickets, until disk partition is full. As this is not
different from the damage a malicious user can do anyways, it
was decided that this was an acceptable risk.
Also notable is the absence of a method of submitting tickets
that are already closed. This is due to the system's reliance upon
sendmail as a method of submitting tickets. At some future point,
there may be a method of creating tickets from within the main 'tt'
program, which would rely upon checking the users UID against a
list of "allowed" UIDs. At this point, an administrator has to
create a ticket, and then manually close it. This is a feature
that is included in at least one other trouble tracking system [X3].
A secondary effect is that any user can see the existence of
any other ticket in the system, including the one line problem
description, who submitted the ticket, and several other fields.
A "confidential" field has been added to the system which stops
a normal user from seeing any of the other details of a ticket.
Even so, it would still be difficult to change the system to
disallow access to all of the information in a trouble ticket.
This may be changed in a future release of the system. Once again,
if this is an issue, then the system can be configured to not allow
any access to the TTS system.
Submitting a ticket is the first step in using TTS. The core engine
is "tts.mail", which is called via an email alias, or indirectly
from an HTML form. This program assigns a ticket number, assigns
the ticket to an administrator, sends copies of the ticket
back to the user, to the administrator, and a ticket to the database.
This section of code can be configured to add multiple "cc" recipients,
if a site requires it. An example might be a director of Engineering
that wants to see every problem that is submitted to the system, as
it is submitted. (Of course, if you decide to start sending mail
from your complainer scripts, as well as output from root cron jobs,
then it should only be a matter of time before the people who
requested being "cc'ed", request to be taken off the list.
"Replying to, editing and closing a ticket"
An administrator can use two methods to respond to a ticket, either
through an email program, or through the 'tt' program
The advantage of using the tt program is
that it records the fact that a reply was sent (in the "last
customer contact" field,) and that it will add an entry to the
log file indicating that a reply was sent.
It will also
create a "lock" on the ticket so that another administrator
cannot make modifications to the ticket.
This ability to use 'tt' or regular email ties in well
with the requirement of multiple interfaces to the system.
A similar ability exists with editing and closing tickets. An
administrator may either use the 'tt' program, or may use a
text editor or a mail program to edit and close tickets. (Closing
a ticket is just moving the ticket out of the "open" spool directory
to the closed spool directory".) The big advantage with using the
supplied tools is that they create lock files, and will "preload"
certain fields for the administrator. This preloading of fields
is beneficial in that it helps reduce the editing time of the ticket.
The lock files also allow more than one administrator to safely access
the trouble tickets without fear of their edits being lost by another
simultaneous edit. Of course, in a single administrator environment,
there is no worry of editing collision. It is felt that the benefits
of preloading fields makes it worthwhile to use the 'tt' interface
when closing and replying to tickets.
The web server is an integral part of complying with the requirement
that multiple architectures can submit and view tickets. Early versions
of TTS were written with the intent of later writing client software
that would do RPC calls to a unix server. Luckily, the explosion of
web servers and browsers negated that coding headache. The web server
currently serves two purposes, the submission of trouble tickets,
and the viewing of several premade reports. These reports are
created every hour, and have allowed me to delay writing
certain HTML enhancements. (A form and cgi to the report generator.)
These reports also help keep the server load down, as most users
can usually find their ticket easily enough in one of the 12 premade
Making reports is the whole reason for having this system, in
particular, making reports to management. As such, the basic
report generator resembles a VMS program more than a typical
Unix program (Large, with lots and lots of options, vs. small
with a few options.)
One of the design effects has been that the age of a ticket is
measured in days, instead of in hours, minutes, or seconds. This
is due to the nature of system administration requests. Very
few system requests come in with a highly time sensitive nature.
This can be contrasted to a Network Operations center, where tracking
a trouble ticket by the minute might be important (An example
of a time sensitive request might be "Router router15.isp.net
went down at 15:34, and the customer is off the Internet.")
There is also a weekly status report writer. This was written in
response to managements desire to have a weekly report on what
the Systems Team did the prior week. It is typically run by hand
late Friday afternoon, and is then incorporated into a report
that details the upcoming projects, as well as the status of
current major projects, and that week's roadblocks.
goes through recently closed tickets to create the report.
It lists the
tickets that have been closed, their age in the queue, the average
length of time user per ticket, the time spent in the last week
on those tickets, and a total of all the time used on the tickets
since they were first opened. It also lists tickets that have
been worked on, but not yet closed with and reports the same statistics
for those tickets.
"Future Plans for TTS"
The current future plans for TTS fall into four categories:
bug fixes, speed enhancements, html enhancements, and a major rework
of the report generator to allow better database querying ("and", "or", and
"not" statements, as well as parenthised expressions.)
At this point in time, only the first three have been scheduled.
Bug fixes will be done as bugs are noticed. Minor speed enhancements
will be done as bugs are being fixed. There are no major speed
enhancements planned at this point in time. This is due to
a lack of a method, rather than a lack of time. The transition to
one ticket per file, and the addition of indexes were the last
two major speed enhancements. Putting the indexes into some kind
of DBM file is against several of the system's requirements for
usability and portability.
The addition of an HTML form to the report generator is planned.
Also in the pipeline is adding a graphical mode for the statistics
reports. An HTML interface to allow the editing of trouble tickets
should be completed by the end of the year. The presence of such a
feature might allow unauthorized users the ability to
edit tickets, as well as viewing confidential tickets or the "Admin
Comments" field. The full security ramifications of this have not
yet been detailed.
Adding better expression parsing is waiting for a volunteer to help
me with this, as this is beyond my comfort level as a programmer.
There are a few "quick hacks" which are under consideration, but
nothing definite has been planned.
Several interesting user usage patterns have emerged from the
three companies where I was intimately involved with the trouble
tracking system. These tend to break down into user related
issues, and long term trends.
"User Usage Patterns"
There is a distinct correlation of the usage patterns of TTS with a
"Product Life cycle" that sales and marketing people talk about.
Those phases are Introduction (1-2 weeks), Growth (2-6 weeks), Maturity
(7+ weeks), and Decline (The introduction of a new product for time
and problem tracking).
"The Introduction Phase"
The people who are most likely to use TTS during the Introduction
a small group of people who will jump on any new technology (Early
Adaptors). (Probably the same group of people who download and
compile the latest copy of gcc the day it's released!) These
users are are important for several reasons. They help iron
out any installation bugs, and they also help create an atmosphere of
acceptance for the product. This is important because you can
use these users as implicit "peer pressure" to encourage other users
to start using the system. It is important that these users be
rewarded with prompt action on their requests. This is both a
reward for their being early adaptors, and also helps them talk
about the "quick response" that they had after submitting a ticket.
This is important for the growth phase.
"The Growth Phase"
The Growth phase typically starts one to two weeks after the system
has been announced. The initial installation bugs have been worked
out, management has publicly endorsed the product, and the office
gossip has gotten around that submitting problems through TTS gets
results. These are typically the programmer/engineering employees.
This is probably due to their current use of a bug/problem tracking
system for the main product of the company.
During this phase, the number of tickets entered into
the system will vary widely from day to day, and from week to week.
This is because of users deciding to submit tickets instead of
grabbing someone in the hallway, and from users submitting long
term or old requests into the system. This is the period of time
when the systems team has to start enforcing a policy where users
submit a ticket instead of making a phone call, or sending email
to their favorite administrator.
The "late adaptors" of the growth phase are typically the non-engineering
employees (Sales, marketing, and administration.) These users
will eventually start using the tracking system, (if they are
going to use it at all.) These users will probably need to have
a fair amount of one-on-one instruction about how to use the
system, and the benefits to them of using it.
The Growth phase is also a period of self training for the admin team.
All requests for help
be entered into the system by the administrators. This is necessary
for long range tracking of tickets. The admin team must also
become used to the assorted interfaces during this period. And lastly,
the admins need to get used to closing tickets that have been finished.
Not doing any of these can lead to the self destruction of the system.
(What good are the statistics, if everyone knows that the data behind
them is bad?)
"The Maturity Phase"
The "Maturity" phase seems to start about two months after the
introduction of the tracking system.
This is indicated by the number of tickets
submitted every week stabilizing.
Most of the users who are
going to use the system are using it, and most of the old requests
and long term projects have already been submitted into the system.
It appears that the higher a user is in the management food chain,
the less likely they are to submit their own tickets. This becomes
very apparent during the Maturity phase.
This may be a natural outgrowth of management being used to giving
orders to their subordinates without having to give them explicit
instructions, nor needing to fill out paperwork.
team will just have to open trouble tickets for these users. (Like
you can tell the CEO to file a trouble ticket so that you
will work on his printer?)
"The Decline Phase"
The Decline phase is when another product is installed that supersedes
the current set programs for trouble tracking. This might be to
management purchasing a professional product, or another
set of programs from some future programmer, or maybe a significantly
improved new release of TTS. The important things to consider when
when changing to a new product are keeping the old data available,
and maintaining a similar method for users to submit tickets. TTS
data can easily be re-submitted into a new system, as long as the
new system allows a program to submit data (NOT manual entry of each
ticket). Since each ticket is a single file under TTS, the work
involved should be minimal. By keeping a similar method for submitting
tickets, the pain of retraining users is minimized. This is important
because the time involved with training users can be lengthy, and
anything that can be done to minimize it will save time and money
in the long run.
"Long Term Trends"
It seems that many users will not submit trouble tickets if the
administrator they favor is not the person that the ticket will
be assigned to. There have been many incidents where a user
would come by to see who the administrator of the week is, and
would then go away. Checking into this revealed that they were
waiting for their preferred administrator to be on the "hotseat".
Checking the database searching for tickets assigned from some of
those users indicated a definite preference for certain administrators.
(In one case, over 80% of one user's tickets went to a single
Another interesting pattern is that ticket submission drops dramatically
when there is no live administrator to deal with a problem. For example
when a conference or training session is scheduled, and the users are
aware of it, then the users do not submit tickets. There is then a
corresponding upswing of ticket submissions after the event is over.
Interestingly enough, this upswing never makes up for all the
unsubmitted tickets. This leads me to believe that many users really
can solve many of their own problems on their own. (How efficient
they are is another paper for yet another conference.)
One of the most serious trends is noticeable only after two or three
months worth of data has been gathered. This trend is the "Not
enough help" trend. This is where you have 30 hours per week of
schedulable administrator time, but you are receiving 30+ hours
of requests every week. It is at that point that you can go to
your management and ask them to either hire more staff, or allow
you to reduce the number of tasks that the systems team is responsible
for. And if they refuse to do either, then maybe it's time to find
another manager. (At least you can quantify what kind of hole
you've found yourself in.)
Thanks to Pat Deuchar of Quantum,
Mary Holstege and Robert Smith of KnowledgeSet Corp,
Eric Dietiker of Dow Jones Telerate Systems,
John Jarocki of AMD
for their assorted comments on TTS and helping me learn HTML.
Many thanks to my wife Mercedes for putting up with late night
coding and documentation sessions.
TTS is available at
Eric Wedaa has been a Unix system administrator since 1989.
He currently works at KnowledgeSet Corporation, Mountain View
California as the Senior System Administrator (and the one and
only administrator). His career has consisted of working at
small companies in Silicon Valley, with one mistake of working
at a large chip manufacturing company. He has a B.S. in M.I.S.
from the University of Arizona.
 RFC 1297, NOC Internal Integrated Trouble Ticket System Functional Specification Wishlist, D. Johnson, January 1992.
 Elizabeth Zwicky, Getting More Work Out of Work Tracking Systems, in LISA VIII pp 105-110, San Diego, CA, 1994.
[X3] ?????, ?????????, ????, ????, ????.
 Remy Evard, Managing the Ever-Growing To Do List, in LISA VIII, pp
111-116, San Diego, CA, 1994.
Appendix A contains a list of software that was pulled off the Internet
and evaluated at various times during the course of this project. This
list is by no means a complete list of trouble tracking software, but
does cover most of the freely available packages. Multiple thanks go
to Remy Evard of Northeastern University for his appendix A , which
pointed out several packages I was unaware of.
Many of these packages are also available at
ftp:/ftp.ccs.neu.edu/pub/sysadmin/tracking, or at
These packages are presented in alphabetical order.
GNATS, version 3.XX, available via ftp at prep.ai.mit.edu in /pub/gnu/gnatsXXXX. A Tk interface is also available at the same site.
NEARnet Trouble Tracking System, version X.XX, available via ftp at ftp.near.net in /pub/nearnet-ticket-system-v1.3.tar.
NETLOG, version X.XX, available via ftp.jvnc.net in /pub/netlog-tt.tar.Z.
PTS/Xpts, version X.XX, available via ftp at ftp.x.org in /contrib/ptsXXX.
Queue MH, available via ftp at ftp.cs.colorado.edu in /pub/sysadmin/utilities/queumh.tar.Z.
Req, version 1.XX, available via ftp at XXX.xxx in XXX/XXX/XXX. A Tk interface is also available at the same site.
Request, version X.XX, available via ftp at pearl.s1.gov in /pub/request/requestXX.
Requette, version X.XX, available via ftp at ftp.crim.ca in