I just experienced one of the usual “hangs” in Microsoft Word 2003 that happen pretty regularly when working on multiple, large documents for any significant length of time. The WINWORD.EXE process is taking up 50% of my CPU (which as a dual-core processor, means that there’s a thread that’s somehow taking up 100% of the logical CPU for which it’s scheduled), and has been doing this for at least ten minutes now with no letup.
In my experience, these “runaway consumers of CPU cycles” just never quiesce — eventually I have to decide to kill WINWORD.EXE from Task Manager or Process Explorer, or else the offending process will consume that “CPU” from now until the end of time.
Maybe I was just bored today, ’cause rather than just kill the runaway process, I decided to see if I could dig a little deeper. [I think Mark Russinovich has infected me with the idea that these are surmountable problems — though I wouldn’t dream of trying to make a favourable comparison between my haphazard hacking and Mark’s mad skillz.]
Let’s have a look at a few screenshots, shall we?
Once I decided to investigate, I fired up Process Monitor and limited it to WINWORD.EXE. The activity logged is almost entirely like this:
Don’t strain your eyes too badly on this — I’ve included this just to note the incessant nature of the major activity here: a rapidly-repeating WriteFile operation on a single Temporary file (~WRS1954.tmp), interrupted once in a while by a smaller (Length of anywhere between 512 and 3072) ReadFile operation on the same file:
Interestingly, these ReadFile operations occur in an irregular but repeating pattern:
Also of note is the fact that this temporary file is constantly growing in size, and not just temporarily swelling the data stored within a pre-allocated file — I confirmed that by right-clicking on the Path in Process Monitor, chose “Jump to Location…” and simply Refreshed on the folder to observe the reported file Size was incrementing every time (over a span of 50 minutes, it grew by approx. 222 Kb, or 233657856 bytes).
If I look closer at the Handles for WINWORD.EXE, I notice that this is one of many Temporary files open by Word, which implies that the problem we’re experiencing is very specific to one type of unexpected activity (and not just affecting Word’s handling of Temporary files):
(Note: I intentionally “hashed” out the original filename, which is the last entry in the list pictured.)
One other piece of information: I tried to resize the Window in which the active document was being displayed. Word appended “(Not Responding)” to its Title Bar, and that seems to have changed the behaviour profile of the WINWORD.EXE thread. Since that point in time, Process Monitor did not record any further increase in the size of the ~WRS1954.tmp file, but recorded one additional ReadFile operation on the WINWORD.EXE file itself (Offset: 3998720, Length: 4096). [WINWORD.EXE File version = 11.0.8169.0, Digital signature timestamp = May 31, 2007 12:38:03 PM]
Finally, I grabbed a full memory dump of the WINWORD.EXE process, using windbg.exe and the .dump /ma command. I can’t say I know much about debugging a dump file, but I’ve got it on the off-chance that I ever find a good guide to debugging.
What Caused This?
Three circumstances I think contributed to this, though in my opinion none of them should lead to hung process (since I’ve done this more often without incident):
- I had opened a Word 2003 document directly from Outlook (it was attached to an email).
- The document had Track Changes enabled, and I’d already added Comments throughout the document.
- In the Comment I was just editing, it had scrolled off screen…
…and I had just attempted to apply formatting (I’d typed [Ctrl]-B and [Ctrl]-I rapidly,to bold and italicize) to a single word in the Track Changes panel below the document (the one that opens automatically when you keep typing in Comments that have already “scrolled off screen”).
(Note: I intentionally redacted the confidential text — but it sure ain’t artistic)
Caveat: While my experience with Word over the years has taught me that heavy use and abuse of the Comments feature leads to instability, I’m still miffed that I’d lose the recent batch of edits just because I’d foolishly tried to emphasize my point using basic formatting in a Comment.
So What Can We Conclude So Far?
I don’t know much about reading a stack trace, so this is all guesstimation on my part (plus a little intelligence gathered from a good Russinovich article). The WINWORD stack indicates that Word has called ntkrnlpa.exe aka the Windows kernel. It looks like it’s basically stalled (KiDispatchInterrupt) while creating a new thread (KiThreadStartup). Looking lower in the stack, the first caller in WINWORD is labelled only “0x1a772b” — whatever that is, it’s beyond my skills to unearth the identity of that API.
The next one down in the stack, however, is wdGetApplicationObject(). There’s no information in MSDN that references this function, though a few pages on the ‘net do allude to it (mostly in the same kinds of searches I made). The best info I could find was here, which I’m guessing is Word’s way of getting a handle to the overall Word “application object”. However, without any further context, it’s very hard to imagine what is really going on here.
Turning to the GdiPlus stack, it looks like another kernel call that’s stalled (many similar references to “WaitForMultipleObjects” functions), all boiling down to a call to the GdipCreateSolidFill() API. From what MSDN documents, this seems like a pretty innocuous function, having nothing to do with temporary files, only to do with UI. I can understand this — by the time I’d looked at the GdiPlus stack, I believe the UI had “hung” (aka it was non-responsive). So while this thread was also active, it’s almost impossible for it to be involved in this issue.
Then the only thing I know for sure is the temp file was growing due to some runaway operation, and the runaway operation (which was probably related to an attempt to format Comment text) at some point obtained a handle to the Word application object.
I’m guessing that the only way to get any closer to the root cause would be to dig into the memory dump. And…bugger me, the dump I grabbed ended up with this as its STACK_TEXT (from !analyze -v):
0adaffc8 7c9507a8 00000005 00000004 00000001 ntdll!DbgBreakPoint
0adafff4 00000000 00000000 00000000 00000000 ntdll!DbgUiRemoteBreakin+0x2d
Guess that’s “the wall” for me.