Thanks to:
alan_at_nabeth
Allan E Johannesen
John P Speno
John J. Francini
Christopher K Davis
I got some good questions and suggestion from people on where to look
for the problem.  They suggested that I use nfsstat; vmstat; and the
command "ps -p 0 -m -o  wchan,state,time".  I was advised to check
swapping, paging and forking
We realized that what was causing the high load may actually have been
the NFS mounted directory.  
I got an informational post from John Francini:
> Pid 0, the nominal "kernel idle" process, also is where things like 
> NFS servers and clients live, along with all the other kernel 
> threads.  Consequently, soaking up idle time is just one piece of the 
> puzzle.
>
> To see what all is going on under PID 0, do
>
> ps -p 0 -m -o  wchan,state,time
And also from alan_at_nabeth
> I believe the kernel idle "process" collects statistics
> for a whole bunch of kernel threads.  It may also count
> all the system's idle time.  A high load average is not
> indiciative of high CPU utilization and serial ps(1)
> listings are not a good way to look at overall CPU
> usage.
[...]
Thanks for all the help,
Kevin 
   
[summary post]
> I've gotten some good suggestions so far, but most people are asking me
> questions, so I realize I left out some important details.  The machine
> is a single processor, with about 7 Gigs of memory, running 4.0F
> unpatched.  It is not really an NFS server, but it does export one
> directory to another Alpha, read only, via NFS.  Although, I've checked
> the other machine which has 10 people logged on, and it dosen't seem
> like anyone is doing anything.
> 
> Here is a trimmed off copy of top with the known process called
> some_compile.
> 
> > load averages:  3.11,  2.56,  2.51                                     14:26:01
> > 72 processes:  2 running, 24 sleeping, 41 idle
> > Cpu states: 14.0% user,  0.0% nice, 82.4% system,  3.9% idle
> > Memory: Real: 4352M/7201M act/tot  Virtual: 48M/20573M use/tot  Free: 1194M
> >
> >   PID USERNAME PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
> > 22286 bob       42    0 2218M 2217M run   166:17 44.90% some_compile
> > 19011 root      44    0 2600K  393K sleep   1:01  0.30% top
> 
> Notice that even though some_compile uses 44.9% CPU, the Cpu state is
> 3.9% idle.  In fact, if you watch top, it is usually 0.0%.  The load
> average is what is worrying me.  I haven't seen this compile make the
> load go over 2.00, but it's into the 3's now.
> 
> swapon -s shows it is not using any swap space.  Here is the iostat
> output.  Notice there is barely any disk transfers.  Most of the i/o is
> in system mode CPU.
> 
> >       tty     fd0      rz0      rz9     rz16     cpu
> >  tin tout bps tps  bps tps  bps tps  bps tps  us ni sy id
> >    0   97   0   0   12   1    0   0    0   0  36  0 27 37
> >    0  348   0   0    0   0    0   0    0   0  15  0 84  1
> >    0  384   0   0   40   4    0   0    0   0  15  0 84  1
> >    2  456   0   0    0   0    0   0    0   0  19  0 81  0
> >    3  478   0   0    0   0    0   0    0   0  21  0 77  2
> 
> I may be barking up the wrong tree with the [kernel idle], I just
> couldn't find anything else that made up for the CPU usage.
[original post]
> > Today, I noticed that our ES40 is running at a load of ~3 when it the
> > jobs it has shouldn't be pushing it much more than 1.  I looked through
> > all the running processes and noticed that the [kernel idle] process is
> > taking up quite a bit of CPU resource.  I checked the managers list
> > archive, and I only found questions when the %MEM is high, and nothing
> > for %CPU.
> >
> > # ps aux
> > USER        PID %CPU %MEM   VSZ  RSS TTY      S    STARTED         TIME
> > COMMAND
> > root          0 51.7  3.6 3.56G 261M ??       R <    Jul 10  4-15:20:31
> > [kernel idle]
> >
> > I've checked our other Alphas and none of them have CPU usage nearly
> > like the ES40, even similarly loaded ones.  Can someone shed some light
> > on this?
-- 
Kevin Dea
System Administrator
Alpine Electronics Research of America
Received on Wed Aug 30 2000 - 00:44:57 NZST