Lenovo T61? High CPU usage? Look for bad drivers (…spewing hardware interrupts)

For a couple of months, I noticed some pretty unusual CPU utilization on my Lenovo T61 Thinkpad (running Windows XP SP3).  I run Process Explorer (ProcExp) in the background all day, to keep an eye out for unusual system performance behaviour, and lately I’ve noticed that almost an entire “core” on my Intel dual-core system is consumed by something running in Kernel space.

notificationAreaI have ProcExp configured to display the CPU meter in the “systray” (aka notification area), and it’s really easy to know when there’s a kernel-consumer on your system – that’s when the CPU utilization graph will show a sizeable proportion of the line graph in red, rather than its more typical green (for User space consumers).

procexp Pop open ProcExp under this condition, and if you’ve spent as much time watching these behaviour profiles as I have, and you’ll be as surprised as I was to find the “Interrupts” process (really, just an abstract aggregate) reporting CPU utilization in the 30’s or 40’s.

 

Summary of the solution: bad drivers, needed to be updated.  In my case it was a network driver that had just been updated by the vendor; for others the culprits seem to be video drivers.  I’m sure other failures exist, but these are the two most common.

Step 1: Device Manager

When hardware and/or drivers are acting up like this, check to make sure your drivers are loading properly in Windows:

  • Open the Device Manager control panel applet (e.g. run “devmgmt.msc”)
  • Look at the list to see if any categories are expanded
  • In every expanded category, examine the details of any device that has a yellow exclamation mark (indicating there is some problem with the driver).  If at all possible, resolve the issue that’s been identified.

Until all driver issues have been resolved, any other suggestions you follow below could be moot.  If you really can’t fix the problem(s) that Device Manager is presenting, but want to try other approaches, then there are a couple of tactical workarounds you may wish to try:

  1. Install a newer (or older) version of the driver, to see if it’s just the current driver version that’s having problems.
  2. Disable the device temporarily, and make sure you are still seeing the excessive hardware interrupts being generated for a sustained period.  (If not, then perhaps that was the offending driver, and perhaps you’ll be able to run with the device disabled while the vendor addresses the issue in a later version of the driver.  Lucky you.)

Step 2: KernRate

Microsoft has an interesting little package, developed for use on Windows XP, to dig into a little detail on the Kernel-related activity on the system.  It’s not exactly a wizard I’d send to my mom, but it’s helped at least narrow the potential issues to track down for me.

You can download KernRate from here.

2.1 Setup & Config for a valid KernRate result

  • Install Windows Debugging Tools
  • Install KernRate
  • Configure valid symbols PATH variable – see this thread for details
    • Create a new directory C:\Symbols
    • Create a new environment variable by clicking on: Start menu > Control Panel > System > Advanced > Environment Variables > click New under “user variables for …”, then type:
      • Variable name: _NT_SYMBOL_PATH
      • Variable value:
  • Copy DBGHELP.DLL and SYMSRV.DLL from the install location for Windows Debugging Tools (usually C:\Program Files\Debugging Tools for Windows (x86)) to the install location for KernRate (usually C:\Program Files\KrView\Kernrates)

2.2 Use KernRate to profile kernel activity

Once it’s installed and configured correctly, you can just fire up Kernrate_i386_xp.exe, let it run for a few minutes (during the period when the Kernel activity is not what you’d expect), and then hit [Ctrl]-C to get the results of the Kernrate run.

Here’s a typical output from my system during normal operating conditions  (NOTE: for every Kernrate output, I’ve left out the entries from the final table that have no significant “hits” i.e. no significant volume of kernel events):

C:\Program Files\KrView\Kernrates>Kernrate_i386_XP.exe
/==============================\

\==============================/
Date: 2009/06/09   Time: 13:50:38
Machine Name: MIKESL-MOBL1
Number of Processors: 2
PROCESSOR_ARCHITECTURE: x86
PROCESSOR_LEVEL: 6
PROCESSOR_REVISION: 0f0b
Physical Memory: 2007 MB
Pagefile Total: 3856 MB
Virtual Total: 2047 MB
PageFile1: \??\C:\pagefile.sys, 2006MB
OS Version: 5.1 Build 2600 Service-Pack: 3.0
WinDir: C:\WINDOWS

Kernrate User-Specified Command Line:
Kernrate_i386_XP.exe

Kernel Profile (PID = 0): Source= Time,
Using Kernrate Default Rate of 25000 events/hit
Starting to collect profile data

***> Press ctrl-c to finish collecting profile data
===> Finished Collecting Data, Starting to Process Results

————Overall Summary:————–

P0     K 0:00:11.296 (17.7%)  U 0:00:06.171 ( 9.7%)  I 0:00:46.328 (72.6%)  DPC
0:00:00.078 ( 0.1%)  Interrupt 0:00:00.031 ( 0.0%)
       Interrupts= 76929, Interrupt Rate= 1206/sec.

P1     K 0:00:12.328 (19.3%)  U 0:00:07.718 (12.1%)  I 0:00:43.750 (68.6%)  DPC
0:00:00.109 ( 0.2%)  Interrupt 0:00:00.531 ( 0.8%)
       Interrupts= 15687, Interrupt Rate= 246/sec.

TOTAL  K 0:00:23.625 (18.5%)  U 0:00:13.890 (10.9%)  I 0:01:30.078 (70.6%)  DPC
0:00:00.187 ( 0.1%)  Interrupt 0:00:00.562 ( 0.4%)
       Total Interrupts= 92616, Total Interrupt Rate= 1452/sec.

Total Profile Time = 63796 msec

                                       BytesStart          BytesStop         Byt
esDiff.
    Available Physical Memory   ,       184655872,       180600832,        -4055
040
    Available Pagefile(s)       ,      1075990528,      1069088768,        -6901
760
    Available Virtual           ,      2132107264,      2131058688,        -1048
576
    Available Extended Virtual  ,               0,               0,
  0

                                  Total      Avg. Rate
    Context Switches     ,       949736,         14887/sec.
    System Calls         ,      2754324,         43173/sec.
    Page Faults          ,       799583,         12533/sec.
    I/O Read Operations  ,         5382,         84/sec.
    I/O Write Operations ,         4657,         73/sec.
    I/O Other Operations ,        92064,         1443/sec.
    I/O Read Bytes       ,     15853022,         2946/ I/O
    I/O Write Bytes      ,      1968451,         423/ I/O
    I/O Other Bytes      ,      9179346,         100/ I/O

—————————–

Results for Kernel Mode:
—————————–

OutputResults: KernelModuleCount = 229
Percentage in the following table is based on the Total Hits for the Kernel

Time   11604 hits, 25000 events per hit ——–
Module                                Hits   msec  %Total  Events/Sec
ntkrnlpa                               9041      63796    77 %     3542933
hal                                    1155      63796     9 %      452614
win32k                                  604      63796     5 %      236691
mfehidk                                 179      63796     1 %       70145
intelppm                                163      63796     1 %       63875
igxpmp32                                112      63796     0 %       43889
HIPK                                     98      63796     0 %       38403 
……

================================= END OF RUN ==================================
============================== NORMAL END OF RUN ==============================

And here’s the output from my system during one of these CPU over-consumption cases:

C:\Program Files\KrView\Kernrates>Kernrate_i386_XP.exe
/==============================\

\==============================/
Date: 2009/06/10   Time:  8:41:26
Machine Name: MIKESL-MOBL1
Number of Processors: 2
PROCESSOR_ARCHITECTURE: x86
PROCESSOR_LEVEL: 6
PROCESSOR_REVISION: 0f0b
Physical Memory: 2007 MB
Pagefile Total: 3856 MB
Virtual Total: 2047 MB
PageFile1: \??\C:\pagefile.sys, 2006MB
OS Version: 5.1 Build 2600 Service-Pack: 3.0
WinDir: C:\WINDOWS

Kernrate User-Specified Command Line:
Kernrate_i386_XP.exe

Kernel Profile (PID = 0): Source= Time,
Using Kernrate Default Rate of 25000 events/hit
Starting to collect profile data

***> Press ctrl-c to finish collecting profile data
===> Finished Collecting Data, Starting to Process Results

————Overall Summary:————–

P0     K 0:02:26.906 (80.8%)  U 0:00:21.640 (11.9%)  I 0:00:13.328 ( 7.3%)  DPC
0:00:01.078 ( 0.6%)  Interrupt 0:01:18.406 (43.1%)
       Interrupts= 3919032, Interrupt Rate= 21548/sec.

P1     K 0:02:14.000 (73.7%)  U 0:00:32.703 (18.0%)  I 0:00:15.171 ( 8.3%)  DPC
0:00:00.640 ( 0.4%)  Interrupt 0:01:02.890 (34.6%)
       Interrupts= 3837550, Interrupt Rate= 21100/sec.

TOTAL  K 0:04:40.906 (77.2%)  U 0:00:54.343 (14.9%)  I 0:00:28.500 ( 7.8%)  DPC
0:00:01.718 ( 0.5%)  Interrupt 0:02:21.296 (38.8%)
       Total Interrupts= 7756582, Total Interrupt Rate= 42648/sec.

Total Profile Time = 181875 msec

                                       BytesStart          BytesStop         By
esDiff.
    Available Physical Memory   ,       564551680,       571240448,         668
768
    Available Pagefile(s)       ,      2194202624,      2233167872,        3896
248
    Available Virtual           ,      2132402176,      2131353600,        -104
576
    Available Extended Virtual  ,               0,               0,
  0

                                  Total      Avg. Rate
    Context Switches     ,      1764882,         9704/sec.
    System Calls         ,     14821228,         81491/sec.
    Page Faults          ,       188084,         1034/sec.
    I/O Read Operations  ,        31768,         175/sec.
    I/O Write Operations ,        27985,         154/sec.
    I/O Other Operations ,       321733,         1769/sec.
    I/O Read Bytes       ,     24755975,         779/ I/O
    I/O Write Bytes      ,     14122756,         505/ I/O
    I/O Other Bytes      ,     31150904,         97/ I/O

—————————–
Results for Kernel Mode:
—————————–

OutputResults: KernelModuleCount = 227
Percentage in the following table is based on the Total Hits for the Kernel

Time   62240 hits, 25000 events per hit ——–
Module                                Hits   msec  %Total  Events/Sec
ntkrnlpa                              40159     181875    64 %     5520137
hal                                   10990     181875    17 %     1510652
igxpmp32                               2381     181875     3 %      327285
HECI                                   2170     181875     3 %      298281
win32k                                 1495     181875     2 %      205498
intelppm                               1335     181875     2 %      183505
mfehidk                                 931     181875     1 %      127972
HIPK                                    660     181875     1 %       90721
Ntfs                                    361     181875     0 %       49621
usbuhci                                 285     181875     0 %       39175
PGPwded                                 223     181875     0 %       30652

……

================================= END OF RUN ==================================
============================== NORMAL END OF RUN ==============================
 

 

2.3 KernRate detailed profiling on individual modules

So what I’m seeing is (like many other folks on the ‘Net) that the NTKRNLPA module is consuming the lions’ share of the CPU (being consumed by Kernel-space threads).  Following some of the threads on the Sysinternals message boards, I dug into the details of what images were being loaded by NTKRNLPA during these kernel events:

2.3.1 NTKRNLPA

C:\Program Files\KrView\Kernrates>Kernrate_i386_XP.exe -s 5 -z ntkrnlpa
/==============================\

\==============================/
Date: 2009/06/10   Time:  8:49:42
Machine Name: MIKESL-MOBL1
Number of Processors: 2
PROCESSOR_ARCHITECTURE: x86
PROCESSOR_LEVEL: 6
PROCESSOR_REVISION: 0f0b
Physical Memory: 2007 MB
Pagefile Total: 3856 MB
Virtual Total: 2047 MB
PageFile1: \??\C:\pagefile.sys, 2006MB
OS Version: 5.1 Build 2600 Service-Pack: 3.0
WinDir: C:\WINDOWS

Kernrate User-Specified Command Line:
Kernrate_i386_XP.exe -s 5 -z ntkrnlpa

Kernel Profile (PID = 0): Source= Time,
Using Kernrate Default Rate of 25000 events/hit
CallBack: Finished Attempt to Load symbols for 804d7000 \WINDOWS\system32\ntkrnlpa.exe

Starting to collect profile data

Will collect profile data for 5 seconds
===> Finished Collecting Data, Starting to Process Results

————Overall Summary:————–

P0     K 0:00:01.984 (39.7%)  U 0:00:00.406 ( 8.1%)  I 0:00:02.609 (52.2%)  DPC
0:00:00.031 ( 0.6%)  Interrupt 0:00:01.421 (28.4%)
       Interrupts= 97650, Interrupt Rate= 19530/sec.

P1     K 0:00:02.125 (42.5%)  U 0:00:00.453 ( 9.1%)  I 0:00:02.421 (48.4%)  DPC
0:00:00.062 ( 1.3%)  Interrupt 0:00:01.406 (28.1%)
       Interrupts= 95410, Interrupt Rate= 19082/sec.

TOTAL  K 0:00:04.109 (41.1%)  U 0:00:00.859 ( 8.6%)  I 0:00:05.031 (50.3%)  DPC
0:00:00.093 ( 0.9%)  Interrupt 0:00:02.828 (28.3%)
       Total Interrupts= 193060, Total Interrupt Rate= 38612/sec.

Total Profile Time = 5000 msec

                                       BytesStart          BytesStop         Byt
esDiff.
    Available Physical Memory   ,       563130368,       560410624,        -2719
744
    Available Pagefile(s)       ,      2230300672,      2226749440,        -3551
232
    Available Virtual           ,      2132402176,      2104180736,       -28221
440
    Available Extended Virtual  ,               0,               0,
  0

                                  Total      Avg. Rate
    Context Switches     ,        73005,         14601/sec.
    System Calls         ,       214633,         42927/sec.
    Page Faults          ,         3814,         763/sec.
    I/O Read Operations  ,         1175,         235/sec.
    I/O Write Operations ,          433,         87/sec.
    I/O Other Operations ,         4569,         914/sec.
    I/O Read Bytes       ,       776604,         661/ I/O
    I/O Write Bytes      ,       129278,         299/ I/O
    I/O Other Bytes      ,       575190,         126/ I/O

—————————–
Results for Kernel Mode:
—————————–

OutputResults: KernelModuleCount = 227
Percentage in the following table is based on the Total Hits for the Kernel

Time   1762 hits, 25000 events per hit ——–
Module                                Hits   msec  %Total  Events/Sec
ntkrnlpa                               1307       4984    74 %     6555979
hal                                     246       4984    13 %     1233948
HECI                                     44       4984     2 %      220706
igxpmp32                                 38       4984     2 %      190609
intelppm                                 36       4984     2 %      180577
win32k                                   33       5000     1 %      165000
mfehidk                                  15       5000     0 %       75000
HIPK                                     12       5000     0 %       60000
……

===> Processing Zoomed Module ntkrnlpa.exe…

—– Zoomed module ntkrnlpa.exe (Bucket size = 16 bytes, Rounding Down) ——-

Percentage in the following table is based on the Total Hits for this Zoom Module

Time   1307 hits, 25000 events per hit ——–
Module                                Hits   msec  %Total  Events/Sec
KiDispatchInterrupt                     787       4984    46 %     3947632
READ_REGISTER_USHORT                    383       4984    22 %     1921147
READ_REGISTER_ULONG                     342       4984    20 %     1715489
READ_REGISTER_UCHAR                      41       4984     2 %      205658
KeReleaseInStackQueuedSpinLockFromDpcLevel         13       4984     0 %       65208
ZwYieldExecution                         10       4984     0 %       50160

……

================================= END OF RUN ==================================
============================== NORMAL END OF RUN ==============================
 

2.3.2 HAL

C:\Program Files\KrView\Kernrates>Kernrate_i386_XP.exe -v -s 5 -z hal
/==============================\

\==============================/
Date: 2009/06/10   Time: 18:25:00
Machine Name: MIKESL-MOBL1
Number of Processors: 2
PROCESSOR_ARCHITECTURE: x86
PROCESSOR_LEVEL: 6
PROCESSOR_REVISION: 0f0b
Physical Memory: 2007 MB
Pagefile Total: 3856 MB
Virtual Total: 2047 MB
PageFile1: \??\C:\pagefile.sys, 2006MB
OS Version: 5.1 Build 2600 Service-Pack: 3.0
WinDir: C:\WINDOWS

Kernrate User-Specified Command Line:
Kernrate_i386_XP.exe -v -s 5 -z hal

KERNRATE: current IMAGEHLP SymOptions: UNDNAME DEBUG

Kernel Profile (PID = 0): Source= Time,
Using Kernrate Default Rate of 25000 events/hit
KERNRATE: IMAGEHLP symbol search path is: .;symsrv*symsrv.dll*c:\symbols*
http://
msdl.microsoft.com/download/symbols;C:\WINDOWS\System32\Drivers;C:\WINDOWS\System32;C:\WINDOWS
CallBack: Loading symbols for hal.dll…
DBGHELP: No header for hal.dll.  Searching for image on disk
DBGHELP: C:\Program Files\KrView\Kernrates\hal.dll – file not found
DBGHELP: hal.dll not found in .
DBGHELP: hal.dll not found in C:\WINDOWS\System32\Drivers
DBGHELP: C:\WINDOWS\System32\hal.dll – OK
DBGHELP: C:\WINDOWS\System32\hal.dll found
DBGHELP: .\halmacpi.pdb – file not found
DBGHELP: .\dll\halmacpi.pdb – file not found
DBGHELP: .\symbols\dll\halmacpi.pdb – file not found
CallBack: Finished Attempt to Load symbols for 806e4000 \WINDOWS\system32\hal.dll

DBGHELP: hal – public symbols
         c:\symbols\halmacpi.pdb\9875FD697ECA4BBB8A475825F6BF885E1\halmacpi.pdb
Starting to collect profile data

Will collect profile data for 5 seconds
===> Finished Collecting Data, Starting to Process Results

————Overall Summary:————–

P0     K 0:00:02.531 (50.6%)  U 0:00:00.453 ( 9.1%)  I 0:00:02.015 (40.3%)  DPC
0:00:00.000 ( 0.0%)  Interrupt 0:00:01.890 (37.8%)
       Interrupts= 119072, Interrupt Rate= 23814/sec.

P1     K 0:00:02.609 (52.2%)  U 0:00:00.656 (13.1%)  I 0:00:01.734 (34.7%)  DPC
0:00:00.031 ( 0.6%)  Interrupt 0:00:01.437 (28.7%)
       Interrupts= 114272, Interrupt Rate= 22854/sec.

TOTAL  K 0:00:05.140 (51.4%)  U 0:00:01.109 (11.1%)  I 0:00:03.750 (37.5%)  DPC
0:00:00.031 ( 0.3%)  Interrupt 0:00:03.328 (33.3%)
       Total Interrupts= 233344, Total Interrupt Rate= 46669/sec.

Total Profile Time = 5000 msec

                                       BytesStart          BytesStop         Byt
esDiff.
    Available Physical Memory   ,       111005696,       117772288,         6766
592
    Available Pagefile(s)       ,      1635250176,      1639911424,         4661
248
    Available Virtual           ,      2132402176,      2131013632,        -1388
544
    Available Extended Virtual  ,               0,               0,
  0

                                  Total      Avg. Rate
    Context Switches     ,        82262,         16452/sec.
    System Calls         ,       292287,         58457/sec.
    Page Faults          ,         6431,         1286/sec.
    I/O Read Operations  ,         1592,         318/sec.
    I/O Write Operations ,          506,         101/sec.
    I/O Other Operations ,         7687,         1537/sec.
    I/O Read Bytes       ,       344208,         216/ I/O
    I/O Write Bytes      ,       457023,         903/ I/O
    I/O Other Bytes      ,       951393,         124/ I/O

—————————–

Results for Kernel Mode:
—————————–

OutputResults: KernelModuleCount = 227
Percentage in the following table is based on the Total Hits for the Kernel

Time   1761 hits, 25000 events per hit ——–
Module                                Hits   msec  %Total  Events/Sec
ntkrnlpa                               1223       5000    69 %     6115000
hal                                     258       5000    14 %     1290000
intelppm                                 74       5000     4 %      370000
HECI                                     55       5000     3 %      275000
win32k                                   32       5000     1 %      160000
igxpmp32                                 29       5000     1 %      145000
HIPK                                     22       5000     1 %      110000
mfehidk                                  15       5000     0 %       75000
……

===> Processing Zoomed Module hal.dll…

—– Zoomed module hal.dll (Bucket size = 16 bytes, Rounding Down) ——–
Percentage in the following table is based on the Total Hits for this Zoom Module

Time   258 hits, 25000 events per hit ——–
Module                                Hits   msec  %Total  Events/Sec
READ_PORT_UCHAR                         143       5000    25 %      715000
READ_PORT_ULONG                         139       5000    24 %      695000
READ_PORT_USHORT                        139       5000    24 %      695000
HalpQueryPerformanceCounter              24       5000     4 %      120000
KeReleaseQueuedSpinLock                  21       5000     3 %      105000
KeAcquireInStackQueuedSpinLock           21       5000     3 %      105000
KfLowerIrql                               8       5000     1 %       40000
ExAcquireFastMutex                        7       5000     1 %       35000
KeGetCurrentIrql                          7       5000     1 %       35000
HalpGetCmosCenturyByte                    7       5000     1 %       35000
HalpReleaseSystemHardwareSpinLock          7       5000     1 %       35000
HalBeginSystemInterrupt                   6       5000     1 %       30000
KeAcquireQueuedSpinLock                   5       5000     0 %       25000
……

================================= END OF RUN ==================================
============================== NORMAL END OF RUN ==============================

2.3.3 IGXPMP32

C:\Program Files\KrView\Kernrates>Kernrate_i386_XP.exe -v -s 5 -z igxpmp32
/==============================\

\==============================/
Date: 2009/06/10   Time: 18:25:11
Machine Name: MIKESL-MOBL1
Number of Processors: 2
PROCESSOR_ARCHITECTURE: x86
PROCESSOR_LEVEL: 6
PROCESSOR_REVISION: 0f0b
Physical Memory: 2007 MB
Pagefile Total: 3856 MB
Virtual Total: 2047 MB
PageFile1: \??\C:\pagefile.sys, 2006MB
OS Version: 5.1 Build 2600 Service-Pack: 3.0
WinDir: C:\WINDOWS

Kernrate User-Specified Command Line:
Kernrate_i386_XP.exe -v -s 5 -z igxpmp32

KERNRATE: current IMAGEHLP SymOptions: UNDNAME DEBUG

Kernel Profile (PID = 0): Source= Time,
Using Kernrate Default Rate of 25000 events/hit
KERNRATE: IMAGEHLP symbol search path is: .;symsrv*symsrv.dll*c:\symbols*
<a href="http://
“>“>”>”>”>”>”>http://
msdl.microsoft.com/download/symbols;C:\WINDOWS\System32\Drivers;C:\WINDOWS\System32;C:\WINDOWS
CallBack: Loading symbols for igxpmp32.sys…
DBGHELP: No header for igxpmp32.sys.  Searching for image on disk
DBGHELP: C:\Program Files\KrView\Kernrates\igxpmp32.sys – file not found
DBGHELP: igxpmp32.sys not found in .
DBGHELP: C:\WINDOWS\System32\Drivers\igxpmp32.sys – OK
DBGHELP: C:\WINDOWS\System32\Drivers\igxpmp32.sys found
DBGHELP: .\igxpmp32.pdb – file not found
DBGHELP: .\sys\igxpmp32.pdb – file not found
DBGHELP: .\symbols\sys\igxpmp32.pdb – file not found
SYMSRV:  The server name or address could not be resolved
SYMSRV:  The server name or address could not be resolved
SYMSRV:  c:\symbols\igxpmp32.pdb\01C98FF4DD054B6AB11509CF9CF04F171\igxpmp32.pdb
not found
SYMSRV: 
http://msdl.microsoft.com/download/symbols/igxpmp32.pdb/01C98FF4DD054B6
AB11509CF9CF04F171/igxpmp32.pdb not found
DBGHELP: C:\WINDOWS\System32\Drivers\igxpmp32.pdb – file not found
DBGHELP: C:\WINDOWS\System32\Drivers\sys\igxpmp32.pdb – file not found
DBGHELP: C:\WINDOWS\System32\Drivers\symbols\sys\igxpmp32.pdb – file not found
DBGHELP: C:\WINDOWS\System32\igxpmp32.pdb – file not found
DBGHELP: C:\WINDOWS\System32\sys\igxpmp32.pdb – file not found
DBGHELP: C:\WINDOWS\System32\symbols\sys\igxpmp32.pdb – file not found
DBGHELP: C:\WINDOWS\igxpmp32.pdb – file not found
DBGHELP: C:\WINDOWS\sys\igxpmp32.pdb – file not found
DBGHELP: C:\WINDOWS\symbols\sys\igxpmp32.pdb – file not found
DBGHELP: C:\WINDOWS\System32\Drivers\igxpmp32.pdb – file not found
DBGHELP: D:\ccViews\autobuild1_BR-0810-1787_14.36_Snapshot\gfx_Development\dump3
2\igfx\xp\gtune\AIM3Lib\igxpmp32.pdb – file not found
CallBack: Finished Attempt to Load symbols for b942f000 \SystemRoot\system32\DRI
VERS\igxpmp32.sys

DBGHELP: igxpmp32 – export symbols
Starting to collect profile data

Will collect profile data for 5 seconds
===> Finished Collecting Data, Starting to Process Results

————Overall Summary:————–

P0     K 0:00:02.812 (56.1%)  U 0:00:00.265 ( 5.3%)  I 0:00:01.937 (38.6%)  DPC
0:00:00.000 ( 0.0%)  Interrupt 0:00:01.812 (36.1%)
       Interrupts= 114753, Interrupt Rate= 22879/sec.

P1     K 0:00:02.937 (58.8%)  U 0:00:00.562 (11.3%)  I 0:00:01.500 (30.0%)  DPC
0:00:00.062 ( 1.3%)  Interrupt 0:00:01.578 (31.6%)
       Interrupts= 109952, Interrupt Rate= 21990/sec.

TOTAL  K 0:00:05.750 (57.4%)  U 0:00:00.828 ( 8.3%)  I 0:00:03.437 (34.3%)  DPC
0:00:00.062 ( 0.6%)  Interrupt 0:00:03.390 (33.9%)
       Total Interrupts= 224705, Total Interrupt Rate= 44941/sec.

Total Profile Time = 5000 msec

                                       BytesStart          BytesStop         Byt
esDiff.
    Available Physical Memory   ,       119107584,       107962368,       -11145
216
    Available Pagefile(s)       ,      1640800256,      1629646848,       -11153
408
    Available Virtual           ,      2132402176,      2102800384,       -29601
792
    Available Extended Virtual  ,               0,               0,
  0

                                  Total      Avg. Rate
    Context Switches     ,        78959,         15792/sec.
    System Calls         ,       301466,         60293/sec.
    Page Faults          ,         8704,         1741/sec.
    I/O Read Operations  ,          711,         142/sec.
    I/O Write Operations ,          197,         39/sec.
    I/O Other Operations ,         5430,         1086/sec.
    I/O Read Bytes       ,      1212541,         1705/ I/O
    I/O Write Bytes      ,        14701,         75/ I/O
    I/O Other Bytes      ,      1486190,         274/ I/O

—————————–

Results for Kernel Mode:
—————————–

OutputResults: KernelModuleCount = 227
Percentage in the following table is based on the Total Hits for the Kernel

Time   1764 hits, 25000 events per hit ——–
Module                                Hits   msec  %Total  Events/Sec
ntkrnlpa                               1232       5000    69 %     6160000
hal                                     235       5000    13 %     1175000
intelppm                                 93       5015     5 %      463609
HECI                                     42       5015     2 %      209371
win32k                                   34       5000     1 %      170000
igxpmp32                                 32       5015     1 %      159521
mfehidk                                  23       5000     1 %      115000
HIPK                                     21       5000     1 %      105000
kmixer                                   11       5000     0 %       55000
KSecDD                                   10       5015     0 %       49850
……

===> Processing Zoomed Module igxpmp32.sys…

—– Zoomed module igxpmp32.sys (Bucket size = 16 bytes, Rounding Down) ——-

Percentage in the following table is based on the Total Hits for this Zoom Module

Time   32 hits, 25000 events per hit ——–
Module                                Hits   msec  %Total  Events/Sec
xc_copp_agent                            30       5015    100 %      149551

================================= END OF RUN ==================================
============================== NORMAL END OF RUN ==============================

2.3.4 HECI

C:\Program Files\KrView\Kernrates>Kernrate_i386_XP.exe -v -s 5 -z heci
/==============================\

\==============================/
Date: 2009/06/10   Time: 18:48:17
Machine Name: MIKESL-MOBL1
Number of Processors: 2
PROCESSOR_ARCHITECTURE: x86
PROCESSOR_LEVEL: 6
PROCESSOR_REVISION: 0f0b
Physical Memory: 2007 MB
Pagefile Total: 3856 MB
Virtual Total: 2047 MB
PageFile1: \??\C:\pagefile.sys, 2006MB
OS Version: 5.1 Build 2600 Service-Pack: 3.0
WinDir: C:\WINDOWS

Kernrate User-Specified Command Line:
Kernrate_i386_XP.exe -v -s 5 -z heci

KERNRATE: current IMAGEHLP SymOptions: UNDNAME DEBUG

Kernel Profile (PID = 0): Source= Time,
Using Kernrate Default Rate of 25000 events/hit
KERNRATE: IMAGEHLP symbol search path is: .;symsrv*symsrv.dll*c:\symbols*
<a href="http://
“>“>”>”>”>”>”>http://
msdl.microsoft.com/download/symbols;C:\WINDOWS\System32\Drivers;C:\WINDOWS\System32;C:\WINDOWS
CallBack: Loading symbols for HECI.sys…
DBGHELP: No header for HECI.sys.  Searching for image on disk
DBGHELP: C:\Program Files\KrView\Kernrates\HECI.sys – file not found
DBGHELP: HECI.sys not found in .
DBGHELP: C:\WINDOWS\System32\Drivers\HECI.sys – OK
DBGHELP: C:\WINDOWS\System32\Drivers\HECI.sys found
DBGHELP: .\HECI.pdb – file not found
DBGHELP: .\sys\HECI.pdb – file not found
DBGHELP: .\symbols\sys\HECI.pdb – file not found
SYMSRV:  The server name or address could not be resolved
SYMSRV:  The server name or address could not be resolved
SYMSRV:  c:\symbols\HECI.pdb\0A57FC20FA904FB8BAEC77C53EE39C821\HECI.pdb not found
SYMSRV: 
http://msdl.microsoft.com/download/symbols/HECI.pdb/0A57FC20FA904FB8BAE
C77C53EE39C821/HECI.pdb not found
DBGHELP: C:\WINDOWS\System32\Drivers\HECI.pdb – file not found
DBGHELP: C:\WINDOWS\System32\Drivers\sys\HECI.pdb – file not found
DBGHELP: C:\WINDOWS\System32\Drivers\symbols\sys\HECI.pdb – file not found
DBGHELP: C:\WINDOWS\System32\HECI.pdb – file not found
DBGHELP: C:\WINDOWS\System32\sys\HECI.pdb – file not found
DBGHELP: C:\WINDOWS\System32\symbols\sys\HECI.pdb – file not found
DBGHELP: C:\WINDOWS\HECI.pdb – file not found
DBGHELP: C:\WINDOWS\sys\HECI.pdb – file not found
DBGHELP: C:\WINDOWS\symbols\sys\HECI.pdb – file not found
DBGHELP: C:\WINDOWS\System32\Drivers\HECI.pdb – file not found
DBGHELP: d:\ccviews\autobuild1_br-0707-1049_2.6_snapshot\amt_development\sw\src\
drivers\heci\driver\objfre_wxp_x86\i386\HECI.pdb – file not found
CallBack: Finished Attempt to Load symbols for ba308000 \SystemRoot\system32\DRI
VERS\HECI.sys

DBGHELP: HECI – no symbols loaded
Starting to collect profile data

Will collect profile data for 5 seconds
===> Finished Collecting Data, Starting to Process Results

————Overall Summary:————–

P0     K 0:00:02.296 (45.9%)  U 0:00:00.343 ( 6.9%)  I 0:00:02.359 (47.2%)  DPC
0:00:00.000 ( 0.0%)  Interrupt 0:00:01.359 (27.2%)
       Interrupts= 115350, Interrupt Rate= 23070/sec.

P1     K 0:00:02.203 (44.1%)  U 0:00:00.531 (10.6%)  I 0:00:02.265 (45.3%)  DPC
0:00:00.015 ( 0.3%)  Interrupt 0:00:01.093 (21.9%)
       Interrupts= 110550, Interrupt Rate= 22110/sec.

TOTAL  K 0:00:04.500 (45.0%)  U 0:00:00.875 ( 8.8%)  I 0:00:04.625 (46.3%)  DPC
0:00:00.015 ( 0.2%)  Interrupt 0:00:02.453 (24.5%)
       Total Interrupts= 225900, Total Interrupt Rate= 45180/sec.

Total Profile Time = 5000 msec

                                       BytesStart          BytesStop         Byt
esDiff.
    Available Physical Memory   ,       116039680,       114077696,        -1961
984
    Available Pagefile(s)       ,      1642790912,      1640251392,        -2539
520
    Available Virtual           ,      2132402176,      2105364480,       -27037
696
    Available Extended Virtual  ,               0,               0,
  0

                                  Total      Avg. Rate
    Context Switches     ,        91187,         18237/sec.
    System Calls         ,       270278,         54056/sec.
    Page Faults          ,         6482,         1296/sec.
    I/O Read Operations  ,          269,         54/sec.
    I/O Write Operations ,           41,         8/sec.
    I/O Other Operations ,         5361,         1072/sec.
    I/O Read Bytes       ,      1551742,         5769/ I/O
    I/O Write Bytes      ,         2836,         69/ I/O
    I/O Other Bytes      ,      2381876,         444/ I/O

—————————–
Results for Kernel Mode:
—————————–

OutputResults: KernelModuleCount = 227
Percentage in the following table is based on the Total Hits for the Kernel

Time   1754 hits, 25000 events per hit ——–
Module                                Hits   msec  %Total  Events/Sec
ntkrnlpa                               1273       5000    72 %     6365000
hal                                     222       5000    12 %     1110000
intelppm                                 70       5000     3 %      350000
HECI                                     47       5000     2 %      235000
igxpmp32                                 34       5000     1 %      170000
win32k                                   30       5000     1 %      150000
mfehidk                                  14       5000     0 %       70000
HIPK                                     12       5000     0 %       60000
KSecDD                                   12       5000     0 %       60000
……

===> Processing Zoomed Module HECI.sys…

Symbol Enumeration failed (or no symbols found) on module HECI in CreateZoomedModuleList, Error Code= 0x000001e7
No Hits or No symbols found for module HECI.sys
================================= END OF RUN ==================================
============================== NORMAL END OF RUN ==============================

 

Step 3: Analysis & Research

In my case, it appears that there’s some process (or processes) that’s calling the RtlIpv6StringToAddressA kernel API an excessive number of times.  From what I’ve seen in my research, not only is this a common complaint, but it’s been reported by at least two other T61 users in the last year or so – without any definitive solution.

And what’s this RtlIpv6StringToAddressA API do exactly?  Well, it translates a string form of an IPv6 address to a more strongly-typed form of the address.  Here’s what I know (and what I don’t) about IPv6 in my T61 and Windows XP SP3:

  • The “teredo” driver hasn’t been installed in Windows XP SP3 (“teredo” is the code name used by the Microsoft Windows team that developed the IPv6 stack)
  • There are a number of third-party packages on my system, but the newest to have been introduced are:
    • the AMT 2.6-supporting software and driver (v “2.6.30.1014 built by: WinDDK”, driver date 7/12/2007)
    • the PGP whole disk encryption package
    • the Intel 4965AGN wireless driver (v12.4.0.21), installed earlier
    • the Intel PROSet/Wireless Wifi Software (v12.1.1.0), installed earlier
  • Knowing that AMT is meant to provide a remote networking interface to Intel chipset functionality, and knowing that this problem didn’t occur before the AMT & PGP installs took place, my primary suspect (as the source of these excessive IPv6-related API calls) becomes the AMT software.
  • However, I consulted with colleagues on my team that are intimately familiar with the internals of HECI.sys, and they have confirmed that it doesn’t call the RtlIpv6StringToAddressA() Win32 API
  • That prompted me to search the strings embedded in all the drivers and libraries I could find in C:\Windows\System32.  Here were the files where FINDSTR.EXE could find that API call:
    • dnsapi.dll
    • mswsock.dll
    • ntdll.dll
    • ntkrnlpa.exe
    • ntoskrnl.exe
    • Drivers\http.sys

Disabled the following Devices (which didn’t help):

  • Ricoh Memory Stick Controller
  • Ricoh SD/MMC Host Controller
  • Ricoh xD-Picture Card Controller
  • OHCI Compliant IEEE 1394 Host Controller
  • ThinkPad Modem
  • Ricoh R/RL/5C476(II) or Compatible CardBus Controller
  • SDA Standard Compliant SD Host Controller

Downloaded and installed the following driver updates:

  • bluetooth v5.5.09.6400
  • Ultranav v11.1.21.2
  • Power Manager v1.60
  • ACPI Power Management v1.53
  • Integrated camera v5.8.49004.0
  • Hotkey driver v2.20.0001

Step 4: Make sure you try later and earlier driver versions

Bottom line: I replaced the network driver on my system with one that appeared *months* after I started down this road, and I finally stopped seeing these CPU spews occur.

I was lucky – the vendor dropped a new driver version that happened to resolve this error.  Until that point, I was stuck – nothing I could do to my OS to resolve it.

And if your vendor hasn’t released a *later* driver that eliminates the issue, I’ll recommend something radical – try an *earlier* version of the driver (e.g. like these folks here).  Sometimes programmers introduce errors that didn’t exist in previous versions of their code, even if they didn’t intend to affect such a broad issue.  (Especially when they didn’t intend to.)  You might find a 6-, 12- or 24-month-old driver that still works well enough (for the time being) and doesn’t exhibit the performance drag you’re seeing.  What’s worse – an older driver that works but doesn’t have the latest features, or a driver that has the features but also craters your system performance?

Last resort: you may find that disabling the device in the OS will finally shut up a misbehaving driver (i.e. that the driver won’t load, so it won’t spawn all these excessive interrupts).  This isn’t a great long-term solution, and it’s only viable for hardware you don’t need to use (e.g. a biometric device such as fingerprint reader), but it’s one last thing to consider before junking your whole system.

Notes: related articles

Sysinternals Forums

  1. Hardware interrupts using excessive CPU
  2. Kernrate trace + RtlIpv6StringToAddressA (?)

Lenovo Forums

  1. x200 Hardware Interrupts constantly at 40% of CPU resources
  2. X200: hardware interrupts causes cpu load?
  3. T61: Problems with Hardware Interrupts when on battery
  4. Problem on S12 with flood of interrupts from Broadcom wireless-G after Windows resume
Advertisements

3 thoughts on “Lenovo T61? High CPU usage? Look for bad drivers (…spewing hardware interrupts)

  1. this is VERY annoying an only happens when my T61 is running on battery, i thought the issue was with the Intel PRO/Wireless 3945ABG driver – the problem is if I update I then get blue screens – which I guess is worse. Only solution I found was to never unplug my laptop. Did you find a solution?

    Like

  2. Nice post Mike. There are many things that can cause interrupt storms, usually hardware or bad drivers. I had one on a Thinkpad X61s which I was chasing for months. I knew it had something to do with the hard drive. I replaced it (needed more space anyway) but no luck. All drivers up to date. Finally, I noticed that I could initiate the storm by switching to battery. I suspected the Lenovo Power Manager, but I didn't want to remove it. I noticed a setting that will stop the disk drive when the machine is idle. I set it to “never” for battery use and the problem disappeared.

    I suspect there might be a problem with the Lenovo Power Manager device driver. If your interrupt storm seems battery-related, try disabling the Power Manager's ability to stop the disk on idle.

    Like

  3. I am using the newer toolkit from microsoft.

    I get as far as knowing that it is calling 0xfffff80002818e78 in hal.dll a ton of times and eating the processor time on my 1st cpu. How can i find out what 0xfffff80002818e78 is? It does not show up anywhere on google.

    (Windows Vista (x64) .. ya i know but i cant buy 7 right now)

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s