.ds LH

"Abstract"



Most system administrators work in an environment where they have multiple ongoing tasks. In some cases, a system administrator might have several hundred tasks in their list of "Things to do". These tasks might range from the "In your copious spare time, please do this" type of request, to "This project needs to be done this week" type of request. Some projects are too big for a single ticket, and might need to be broken into several other tickets. One of those tasks might be "Install a new dialin mechanism for the users.)

This paper discusses the software that I have developed to help me with all these issues. Several user patterns and some long term trends and their implications are also discussed.

"Overview"



TTS is a Perl, Bourne shell, and mail based trouble tracking system designed for use by System Administrators. It is different from bug tracking systems and general purpose time tracking systems, in that there are special needs for system administrators that are not met by those other products.

Trouble tracking systems are used for many purposes by users, administrators, and management. RFC 1297 [1] lists eight uses for a trouble tracking system which seems to encompass the uses I have seen. I refer the reader to the RFC for more details on these uses, but present a synopsis here for those unfamiliar with this RFC. These uses are:

1) Short-term memory and communication. 2) Scheduling and work assignment. 3) Referrals and dispatching. 4) Alarm clock. 5) Oversight by engineers and customer/site representatives. 6) Statistical analysis. 7) Filtering current alerts. 8) Accountability (CYA).

"Site Information"



TTS has been used at sites with 22 users to 60 users. It is currently in use at Objectivity in Mountain View, California, and at KnowledgeSet Corporation, also in Mountain View, California. The system has been tested with over 600 open tickets, and the response time has been acceptable. It is expected that the system can handle over 1000 open tickets without serious response time delays, and can handle over 30000 closed tickets without an unreasonable delay. (Depending upon your local hardware, network, etc.)

"History"



"Version 0"

Version 0 was an electronic file that was edited with a text editor. This file, (while crude) contained the basic ideas of a trouble tracking system. In particular, it detailed (in 80 column format, no less), the requester, the request date, the problem, and the estimated time it would take to do the task. It also had an update history attached to each "Ticket". This was an indented line underneath each ticket with a date an action accomplished on that date.



A primitive report generator was also part of V0, in that I also had a script which would take the estimated time to complete each task, and would total it. This was sufficient enough to show management that they had given me 26 weeks of work to do. Shortly thereafter, I was given a hiring requisition for another administrator.

"Version 1"

Version 1 (V1) was implemented the week after LISA 8 in San Diego California. This was the conference where Req was introduced, and it seemed that half the participants were talking about trouble tracking systems. I decided that I should probably implement something a little more official, and with a few more features than the text files that we were currently using.

"Version 2"

Version 2 (V2) was started in May of 1995, with the goal of making Version 1 more portable, and maybe of use to other sites. Significant enhancements were added to meet this goal. In particular, the "config.perl" script was created to change site-specific variables in all of the existing scripts. HTML enhancements were added after September of 1996 in order to let non-Unix users access the database. And most importantly the transition to a single file per ticket was started.

The concept of an "Administrator" comment was also added after LISA 9 (September of 1995, Monterey). This was added directly because a certain administrator (who shall remain nameless) mentioned that she thought the idea of a field that could be seen only by administrators was a good idea. This was so that an administrator could safely add the comment "This user is acting like a rabid wildebeast, be careful!". This would warn any other administrators that they should be careful, without letting non-administrators know that they were acting irrationally. (Of course, none of my users ever acted like a "rabid Wildebeast". Maybe a rabid pit-bull, but never a rabid Wildebeast.

"Version 3"

Version 3 was started in February of 1996, in a concurrent development effort with version 2.85. While bug fixes and some minor enhancements were made to Version 2, the majority of new development work was done to Version 3.0. Version 3 was started after a review of features of competing products was undertaken, and several interesting features in these products were noted. These ideas came mainly from "Req, version 1.XX", and "Gnats, Version X.XX".

The main difference between V3, and prior releases was that the idea of a single file containing all the trouble tickets was finally abandoned. While some residual code exists for the earlier paradigm, it is no longer supported as such, and will eventually be phased out in favor of a faster interface.

"TTS Requirements Analysis"

In the tradition of most quick hacks, an official list of system requirements was not done until late in the project. While many of the requirements were known at the start of the project, they were not codified until the project was almost completed. The effect of this was that there were several features that were not added early in the project while it would have been easy. There were also several dead ends that were reached in the growth of TTS that could have been avoided had a real requirements analysis been done at the start. On the other hand, if a real requirements analysis been attempted at the start, it might never have even been started.

What follows is the list of system requirements that developed over the course of the project. Of note are the tickets marked with an "*", as those are the requirements that were not among the first, unwritten requirements.

"Ticket Submission"



-Easy for all computer users (Mac, Unix, PC, etc) to submit a trouble ticket.



"User Interface"

-Easy for all computer users to view the current status of their trouble tickets.

-The ability to assign a priority level to each ticket.



"Administrator Interface"

-The ability to assign tickets to specific administrators.

-Administrators will be using Unix workstations.

-Administrators are comfortable with using text editors.

-There should not be a single specific interface to the trouble tickets. Administrators should have their pick of tools to use.



"Hardware and Software platforms"

Support for users on non-unix workstations (VMS, PC-Dos, PC-Windows, NT, Macintosh, Next workstations, etc.)

Support for administrators on multiple flavors of Unix. (SunOS, AIX, Ultrix, HPUX, etc.) Not all of these may support a working network file locking mechanism.

-It should be usable across several machines of different architectures simultaneously.

-It should not require an expensive back-end database.

-Usable by multiple administrators simultaneously.(*)



"Reporting Formats"

Users care more about open tickets, than closed tickets.

-The ability to track how much time any one administrator has spent on any ticket, or on a select set of tickets.(*)

-Multiple report formats, including selecting tickets that are open or closed, selecting by multiple field selection criteria, and with multiple management oriented reports (Time spent per admin per week, length of time in queue, average time used per ticket, etc.)



"Security Considerations"



The basic security paradigm has three rules, 1) Any user can see any ticket in the system, 2) We trust our users, and 3) We need to help keep honest users honest. It was decided early on that any determined individual would be able to read, and possibly edit trouble tickets. This is an unavoidable consequence of the design need of maximum visibility to users of their trouble tickets. No major effort has yet been expended to add security above the basic paradigm. If security is needed above this paradigm, then direct user access to the ticket database can and should be limited (With the corresponding decrease in usability by administrators and users).

This security paradigm has had several effects upon the functionality of TTS. Most notable is the ticket submission method. TTS allows any user to submit a ticket, through either an HTML interface, or through email. The alternative is to have the system administrator (Or a set of people at the "help" desk enter in each ticket. As this was initially designed for a small work environment where the systems group may consist of a single over-worked administrator, this was unacceptable. Because of this, it was decided that an email interface to submit and view tickets was acceptable. An email method to modify or close tickets was rejected as pushing the envelope of trusting users. As such the most damage a user could do is to flood the system with bogus tickets, until disk partition is full. As this is not different from the damage a malicious user can do anyways, it was decided that this was an acceptable risk.

Also notable is the absence of a method of submitting tickets that are already closed. This is due to the system's reliance upon sendmail as a method of submitting tickets. At some future point, there may be a method of creating tickets from within the main 'tt' program, which would rely upon checking the users UID against a list of "allowed" UIDs. At this point, an administrator has to create a ticket, and then manually close it. This is a feature that is included in at least one other trouble tracking system [X3].

A secondary effect is that any user can see the existence of any other ticket in the system, including the one line problem description, who submitted the ticket, and several other fields. A "confidential" field has been added to the system which stops a normal user from seeing any of the other details of a ticket. Even so, it would still be difficult to change the system to disallow access to all of the information in a trouble ticket. This may be changed in a future release of the system. Once again, if this is an issue, then the system can be configured to not allow any access to the TTS system.

"Using TTS"



"Submitting a ticket"

Submitting a ticket is the first step in using TTS. The core engine is "tts.mail", which is called via an email alias, or indirectly from an HTML form. This program assigns a ticket number, assigns the ticket to an administrator, sends copies of the ticket back to the user, to the administrator, and a ticket to the database. This section of code can be configured to add multiple "cc" recipients, if a site requires it. An example might be a director of Engineering that wants to see every problem that is submitted to the system, as it is submitted. (Of course, if you decide to start sending mail from your complainer scripts, as well as output from root cron jobs, then it should only be a matter of time before the people who requested being "cc'ed", request to be taken off the list.[2]

"Replying to, editing and closing a ticket"

An administrator can use two methods to respond to a ticket, either through an email program, or through the 'tt' program The advantage of using the tt program is that it records the fact that a reply was sent (in the "last customer contact" field,) and that it will add an entry to the log file indicating that a reply was sent. It will also create a "lock" on the ticket so that another administrator cannot make modifications to the ticket. This ability to use 'tt' or regular email ties in well with the requirement of multiple interfaces to the system.

A similar ability exists with editing and closing tickets. An administrator may either use the 'tt' program, or may use a text editor or a mail program to edit and close tickets. (Closing a ticket is just moving the ticket out of the "open" spool directory to the closed spool directory".) The big advantage with using the supplied tools is that they create lock files, and will "preload" certain fields for the administrator. This preloading of fields is beneficial in that it helps reduce the editing time of the ticket. The lock files also allow more than one administrator to safely access the trouble tickets without fear of their edits being lost by another simultaneous edit. Of course, in a single administrator environment, there is no worry of editing collision. It is felt that the benefits of preloading fields makes it worthwhile to use the 'tt' interface when closing and replying to tickets.

"Web server"

The web server is an integral part of complying with the requirement that multiple architectures can submit and view tickets. Early versions of TTS were written with the intent of later writing client software that would do RPC calls to a unix server. Luckily, the explosion of web servers and browsers negated that coding headache. The web server currently serves two purposes, the submission of trouble tickets, and the viewing of several premade reports. These reports are created every hour, and have allowed me to delay writing certain HTML enhancements. (A form and cgi to the report generator.) These reports also help keep the server load down, as most users can usually find their ticket easily enough in one of the 12 premade reports.)

"Reporting"

Making reports is the whole reason for having this system, in particular, making reports to management. As such, the basic report generator resembles a VMS program more than a typical Unix program (Large, with lots and lots of options, vs. small with a few options.)

One of the design effects has been that the age of a ticket is measured in days, instead of in hours, minutes, or seconds. This is due to the nature of system administration requests. Very few system requests come in with a highly time sensitive nature. This can be contrasted to a Network Operations center, where tracking a trouble ticket by the minute might be important (An example of a time sensitive request might be "Router router15.isp.net went down at 15:34, and the customer is off the Internet.")

There is also a weekly status report writer. This was written in response to managements desire to have a weekly report on what the Systems Team did the prior week. It is typically run by hand late Friday afternoon, and is then incorporated into a report that details the upcoming projects, as well as the status of current major projects, and that week's roadblocks.

The program make.status.report goes through recently closed tickets to create the report. It lists the tickets that have been closed, their age in the queue, the average length of time user per ticket, the time spent in the last week on those tickets, and a total of all the time used on the tickets since they were first opened. It also lists tickets that have been worked on, but not yet closed with and reports the same statistics for those tickets.

"Future Plans for TTS"

The current future plans for TTS fall into four categories: bug fixes, speed enhancements, html enhancements, and a major rework of the report generator to allow better database querying ("and", "or", and "not" statements, as well as parenthised expressions.)

At this point in time, only the first three have been scheduled. Bug fixes will be done as bugs are noticed. Minor speed enhancements will be done as bugs are being fixed. There are no major speed enhancements planned at this point in time. This is due to a lack of a method, rather than a lack of time. The transition to one ticket per file, and the addition of indexes were the last two major speed enhancements. Putting the indexes into some kind of DBM file is against several of the system's requirements for usability and portability.

The addition of an HTML form to the report generator is planned. Also in the pipeline is adding a graphical mode for the statistics reports. An HTML interface to allow the editing of trouble tickets should be completed by the end of the year. The presence of such a feature might allow unauthorized users the ability to edit tickets, as well as viewing confidential tickets or the "Admin Comments" field. The full security ramifications of this have not yet been detailed.

Adding better expression parsing is waiting for a volunteer to help me with this, as this is beyond my comfort level as a programmer. There are a few "quick hacks" which are under consideration, but nothing definite has been planned.

"TTS Usage Patterns"



Several interesting user usage patterns have emerged from the three companies where I was intimately involved with the trouble tracking system. These tend to break down into user related issues, and long term trends.

"User Usage Patterns"

There is a distinct correlation of the usage patterns of TTS with a "Product Life cycle" that sales and marketing people talk about. Those phases are Introduction (1-2 weeks), Growth (2-6 weeks), Maturity (7+ weeks), and Decline (The introduction of a new product for time and problem tracking).

"The Introduction Phase"

The people who are most likely to use TTS during the Introduction phase are usually a small group of people who will jump on any new technology (Early Adaptors). (Probably the same group of people who download and compile the latest copy of gcc the day it's released!) These users are are important for several reasons. They help iron out any installation bugs, and they also help create an atmosphere of acceptance for the product. This is important because you can use these users as implicit "peer pressure" to encourage other users to start using the system. It is important that these users be rewarded with prompt action on their requests. This is both a reward for their being early adaptors, and also helps them talk about the "quick response" that they had after submitting a ticket. This is important for the growth phase.

"The Growth Phase"

The Growth phase typically starts one to two weeks after the system has been announced. The initial installation bugs have been worked out, management has publicly endorsed the product, and the office gossip has gotten around that submitting problems through TTS gets results. These are typically the programmer/engineering employees. This is probably due to their current use of a bug/problem tracking system for the main product of the company.

During this phase, the number of tickets entered into the system will vary widely from day to day, and from week to week. This is because of users deciding to submit tickets instead of grabbing someone in the hallway, and from users submitting long term or old requests into the system. This is the period of time when the systems team has to start enforcing a policy where users submit a ticket instead of making a phone call, or sending email to their favorite administrator.

The "late adaptors" of the growth phase are typically the non-engineering employees (Sales, marketing, and administration.) These users will eventually start using the tracking system, (if they are going to use it at all.) These users will probably need to have a fair amount of one-on-one instruction about how to use the system, and the benefits to them of using it.

The Growth phase is also a period of self training for the admin team. All requests for help must be entered into the system by the administrators. This is necessary for long range tracking of tickets. The admin team must also become used to the assorted interfaces during this period. And lastly, the admins need to get used to closing tickets that have been finished. Not doing any of these can lead to the self destruction of the system. (What good are the statistics, if everyone knows that the data behind them is bad?)

"The Maturity Phase"

The "Maturity" phase seems to start about two months after the introduction of the tracking system. This is indicated by the number of tickets submitted every week stabilizing. Most of the users who are going to use the system are using it, and most of the old requests and long term projects have already been submitted into the system.

It appears that the higher a user is in the management food chain, the less likely they are to submit their own tickets. This becomes very apparent during the Maturity phase. This may be a natural outgrowth of management being used to giving orders to their subordinates without having to give them explicit instructions, nor needing to fill out paperwork. The systems team will just have to open trouble tickets for these users. (Like you can tell the CEO to file a trouble ticket so that you will work on his printer?)

"The Decline Phase"

The Decline phase is when another product is installed that supersedes the current set programs for trouble tracking. This might be to management purchasing a professional product, or another set of programs from some future programmer, or maybe a significantly improved new release of TTS. The important things to consider when when changing to a new product are keeping the old data available, and maintaining a similar method for users to submit tickets. TTS data can easily be re-submitted into a new system, as long as the new system allows a program to submit data (NOT manual entry of each ticket). Since each ticket is a single file under TTS, the work involved should be minimal. By keeping a similar method for submitting tickets, the pain of retraining users is minimized. This is important because the time involved with training users can be lengthy, and anything that can be done to minimize it will save time and money in the long run.

"Long Term Trends"

It seems that many users will not submit trouble tickets if the administrator they favor is not the person that the ticket will be assigned to. There have been many incidents where a user would come by to see who the administrator of the week is, and would then go away. Checking into this revealed that they were waiting for their preferred administrator to be on the "hotseat". Checking the database searching for tickets assigned from some of those users indicated a definite preference for certain administrators. (In one case, over 80% of one user's tickets went to a single administrator.)

Another interesting pattern is that ticket submission drops dramatically when there is no live administrator to deal with a problem. For example when a conference or training session is scheduled, and the users are aware of it, then the users do not submit tickets. There is then a corresponding upswing of ticket submissions after the event is over. Interestingly enough, this upswing never makes up for all the unsubmitted tickets. This leads me to believe that many users really can solve many of their own problems on their own. (How efficient they are is another paper for yet another conference.)

One of the most serious trends is noticeable only after two or three months worth of data has been gathered. This trend is the "Not enough help" trend. This is where you have 30 hours per week of schedulable administrator time, but you are receiving 30+ hours of requests every week. It is at that point that you can go to your management and ask them to either hire more staff, or allow you to reduce the number of tasks that the systems team is responsible for. And if they refuse to do either, then maybe it's time to find another manager. (At least you can quantify what kind of hole you've found yourself in.)

"Acknowledgments"



Thanks to Pat Deuchar of Quantum, Mary Holstege and Robert Smith of KnowledgeSet Corp, Eric Dietiker of Dow Jones Telerate Systems, and John Jarocki of AMD for their assorted comments on TTS and helping me learn HTML.

Many thanks to my wife Mercedes for putting up with late night coding and documentation sessions.

"Availability"

TTS is available at http:/www.wedaa.com/~ericw/software/

"Author Information"



Eric Wedaa has been a Unix system administrator since 1989. He currently works at KnowledgeSet Corporation, Mountain View California as the Senior System Administrator (and the one and only administrator). His career has consisted of working at small companies in Silicon Valley, with one mistake of working at a large chip manufacturing company. He has a B.S. in M.I.S. from the University of Arizona.

"Bibliography"

[1] RFC 1297, NOC Internal Integrated Trouble Ticket System Functional Specification Wishlist, D. Johnson, January 1992.

[2] Elizabeth Zwicky, Getting More Work Out of Work Tracking Systems, in LISA VIII pp 105-110, San Diego, CA, 1994.

[X3] ?????, ?????????, ????, ????, ????.

[4] Remy Evard, Managing the Ever-Growing To Do List, in LISA VIII, pp 111-116, San Diego, CA, 1994.

"Appendix A-List of other Trouble Tracking Systems"



Appendix A contains a list of software that was pulled off the Internet and evaluated at various times during the course of this project. This list is by no means a complete list of trouble tracking software, but does cover most of the freely available packages. Multiple thanks go to Remy Evard of Northeastern University for his appendix A [4], which pointed out several packages I was unaware of.

Many of these packages are also available at ftp:/ftp.ccs.neu.edu/pub/sysadmin/tracking, or at ftp:/ftp.XXX.com/pub/sysadmin/tracking.

These packages are presented in alphabetical order.

GNATS, version 3.XX, available via ftp at prep.ai.mit.edu in /pub/gnu/gnatsXXXX. A Tk interface is also available at the same site.

NEARnet Trouble Tracking System, version X.XX, available via ftp at ftp.near.net in /pub/nearnet-ticket-system-v1.3.tar.

NETLOG, version X.XX, available via ftp.jvnc.net in /pub/netlog-tt.tar.Z.

PTS/Xpts, version X.XX, available via ftp at ftp.x.org in /contrib/ptsXXX.

Queue MH, available via ftp at ftp.cs.colorado.edu in /pub/sysadmin/utilities/queumh.tar.Z.

Req, version 1.XX, available via ftp at XXX.xxx in XXX/XXX/XXX. A Tk interface is also available at the same site.

Request, version X.XX, available via ftp at pearl.s1.gov in /pub/request/requestXX.

Requette, version X.XX, available via ftp at ftp.crim.ca in /pub/requette-*.tar.Z.





"Appendix B-Field Descriptions for TTS"





"People Information" CONFIDENTIAL: {Yes|No} Defaults to No, setable by the HTML interface. TICKET NUMBER: 00000-99999, set by the ticket processor TICKET PRIORITY: {FIRE|H|MH|MM|ML|L} Defaults to MM, setable by the HTML interface. REPORTED BY: {text} Person who submitted the ticket PHONE: {text} set by phone_list entry or by the HTML interface. OFFICE: {text} set by phone_list entry or by the HTML interface. ASSIGNED TO: email address of the admin the ticket is assigned to, set by the ticket processor. FIXED BY: {Not Yet Fixed|email address} Defaults to Not Yet Fixed.

"Ticket Description" PROBLEM STATUS: {Not Yet Reviewed|Reviewed|Resolved|Closed} Defaults to Not Yet Reviewed. PROBLEM TYPE: {Text} Setable by the HTML interface. KEYWORDS: {Text}, Defaults to shell. Setable by the HTML interface. DEPARTMENT: {1 char} E-engr; S-sales&mktg; F-fin&admin; C-company. Defaults to E. Setable by the HTML interface.

"Date Fields" DATE PROBLEM STARTED: {Date} Defaults to current date. DATE SUBMITTED: {Date} Defaults to current date. LAST CUSTOMER CONTACT: {Date} Defaults to current date. EST. COMPLETION: {Date} Defaults to DAY MMM DD, YYYY DATE RESOLVED: {Date} Defaults to DAY MMM DD, YYYY DATE CLOSED: {Date} Defaults to DAY MMM DD, YYYY

"Time Fields" TIME ESTIMATED: 0:15 {Time} Defaults to 0:15. TIME SPENT: $TIME_SPENT_DATE 0:05 $aow Initial Review of trouble ticket TOTAL TIME USED: ?:?? {Time} Defaults to ?:??.

"Problem Description and Resolution" PROBLEM DESCRIPTION: {Text}, Defaults to email subject. Setable by the HTML interface. PROBLEM REALLY WAS: {Text}, edited by the system administrator. ADMIN COMMENTS: {Text}, edited by the system administrator. Any line that starts with a "^ADMIN COMMENTS:" is a private comment between administrators and is not show to the users. This line is deleted when the ticket is closed. SOLUTION: {Text}, edited by the system administrator.