Introduction:
One of the most significant problems that I have had in administrating a USENET news server has been to determine whether performance problems exist and what to do about them. This makes it hard to maintain a consistent level of quality of service for customers and downstream sites.
One of the main problems with tuning performance on a news server is that there are far too many variables. The more significant ones are:
In order to deal with all of these variables, the goal of my project is to try to gather significant amounts of data over a period of several days to determine where the bottlenecks lie. The most valuable data is the INN timer patch's output. We supliment this with a fake reader client to measure end user performance. We also measure incoming feed rate based on news logs.
The idea is that all of this "real world" performance data would be graphed out over a reasonable period of time. The admin can then look for noticible trends and peak values. Then the second layer of the program can be applied. We also collect a wide variety of system performance data and store it in the same database. This includes I/O and memory utilization, I/O latency, swap activity, etc.
My hope is that based on the right sort of graphs and reports, this data can be effectively presented in a manner that helps to hilight the particular performance bottlenecks of this news server.
For example, if the trends show that a significant amount of swapping is going on, you could tune the kernel paging parameters, or consider turning off some MMAP features in INN, or just adding RAM. If the problem only showed up with a lot of readers going, perhaps you would consider adding actived to reduce the nnrpd memory footprint.
The initial implementation is INN and solaris-specific, but there is no reason that this could not be modularized better and modules written for other OS's and news software.
Architecture:
The INNPerf system is implemented as two separate perl scripts. One, innperf, runs on each news server and handles all of the performance and configuration data collection. It sends all of its results to a mysql database. Then the second script, innperf.cgi, pulls the data out of the database and lets the user display it via a web browser. The basic structure is shown below:
Data Collected:
| Name | Description |
|---|---|
| CHILD_MAX | |
| FSFLUSHR | fsflush run rate |
| GPGSLO | page stealing low water mark |
| HOSTNAME | |
| MAXCLSYSPRI | maximum global priority in sys class |
| MINARMEM | minimum resident memory for avoiding deadlock |
| MINASMEM | minimum swapable memory for avoiding deadlock |
| NAUTOUP | auto update time limit in seconds |
| NPROCESSORS_ONLN | from psrinfo -v |
| OPEN_MAX | |
| OS | |
| PHYSMEM | |
| UNAME | /bin/uname -a |
| bufhwm | maximum memory allowed in buffer cache |
| v.v_maxup | >maximum processes per user id |
| v.v_proc | maximum number of processes |
Links:
Issues:
CGI performance is cruddy.. need to refine the db design. Need to implement log_event in the db.. Need to implement mount point->device tables. mountpoint: config_id device_id path type interleave device device_id