Debugging a Word 2003 runaway thread…but not successfully

I just experienced one of the usual “hangs” in Microsoft Word 2003 that happen pretty regularly when working on multiple, large documents for any significant length of time.  The WINWORD.EXE process is taking up 50% of my CPU (which as a dual-core processor, means that there’s a thread that’s somehow taking up 100% of the logical CPU for which it’s scheduled), and has been doing this for at least ten minutes now with no letup.

In my experience, these “runaway consumers of CPU cycles” just never quiesce — eventually I have to decide to kill WINWORD.EXE from Task Manager or Process Explorer, or else the offending process will consume that “CPU” from now until the end of time.

Maybe I was just bored today, ’cause rather than just kill the runaway process, I decided to see if I could dig a little deeper.  [I think Mark Russinovich has infected me with the idea that these are surmountable problems — though I wouldn’t dream of trying to make a favourable comparison between my haphazard hacking and Mark’s mad skillz.]

Process Explorer

Let’s have a look at a few screenshots, shall we?

image 
(Performance stats, in case that’s useful to anyone — though it doesn’t provide me any telling evidence)

image
(Listing of the threads currently instantiated in WINWORD.EXE including the main thread, which is the one causing all the problems)

image
(Stack contents for the WINWORD.EXE thread)

image
(Stack contents for GdiPlus.DLL thread, which was the only other thread with any activity under the “CSwitch Delta” heading)

Process Monitor

Once I decided to investigate, I fired up Process Monitor and limited it to WINWORD.EXE.  The activity logged is almost entirely like this:

image

Don’t strain your eyes too badly on this — I’ve included this just to note the incessant nature of the major activity here: a rapidly-repeating WriteFile operation on a single Temporary file (~WRS1954.tmp), interrupted once in a while by a smaller (Length of anywhere between 512 and 3072) ReadFile operation on the same file:

image

Interestingly, these ReadFile operations occur in an irregular but repeating pattern:

image

Also of note is the fact that this temporary file is constantly growing in size, and not just temporarily swelling the data stored within a pre-allocated file — I confirmed that by right-clicking on the Path in Process Monitor, chose “Jump to Location…” and simply Refreshed on the folder to observe the reported file Size was incrementing every time (over a span of 50 minutes, it grew by approx. 222 Kb, or 233657856 bytes).

If I look closer at the Handles for WINWORD.EXE, I notice that this is one of many Temporary files open by Word, which implies that the problem we’re experiencing is very specific to one type of unexpected activity (and not just affecting Word’s handling of Temporary files):
image
(Note: I intentionally “hashed” out the original filename, which is the last entry in the list pictured.)

One other piece of information: I tried to resize the Window in which the active document was being displayed.  Word appended “(Not Responding)” to its Title Bar, and that seems to have changed the behaviour profile of the WINWORD.EXE thread.  Since that point in time, Process Monitor did not record any further increase in the size of the ~WRS1954.tmp file, but recorded one additional ReadFile operation on the WINWORD.EXE file itself (Offset: 3998720, Length: 4096).  [WINWORD.EXE File version = 11.0.8169.0, Digital signature timestamp = May 31, 2007 12:38:03 PM]

Finally, I grabbed a full memory dump of the WINWORD.EXE process, using windbg.exe and the .dump /ma command.  I can’t say I know much about debugging a dump file, but I’ve got it on the off-chance that I ever find a good guide to debugging.

What Caused This?

Three circumstances I think contributed to this, though in my opinion none of them should lead to hung process (since I’ve done this more often without incident):

  1. I had opened a Word 2003 document directly from Outlook (it was attached to an email).
  2. The document had Track Changes enabled, and I’d already added Comments throughout the document.
  3. In the Comment I was just editing, it had scrolled off screen…
    image
    …and I had just attempted to apply formatting (I’d typed [Ctrl]-B and [Ctrl]-I rapidly,to bold and italicize) to a single word in the Track Changes panel below the document (the one that opens automatically when you keep typing in Comments that have already “scrolled off screen”).
     image
    (Note: I intentionally redacted the confidential text — but it sure ain’t artistic)

Caveat: While my experience with Word over the years has taught me that heavy use and abuse of the Comments feature leads to instability, I’m still miffed that I’d lose the recent batch of edits just because I’d foolishly tried to emphasize my point using basic formatting in a Comment.

So What Can We Conclude So Far?

I don’t know much about reading a stack trace, so this is all guesstimation on my part (plus a little intelligence gathered from a good Russinovich article).  The WINWORD stack indicates that Word has called ntkrnlpa.exe aka the Windows kernel.  It looks like it’s basically stalled (KiDispatchInterrupt) while creating a new thread (KiThreadStartup).  Looking lower in the stack, the first caller in WINWORD is labelled only “0x1a772b” — whatever that is, it’s beyond my skills to unearth the identity of that API.

The next one down in the stack, however, is wdGetApplicationObject().  There’s no information in MSDN that references this function, though a few pages on the ‘net do allude to it (mostly in the same kinds of searches I made).  The best info I could find was here, which I’m guessing is Word’s way of getting a handle to the overall Word “application object”.  However, without any further context, it’s very hard to imagine what is really going on here.

Turning to the GdiPlus stack, it looks like another kernel call that’s stalled (many similar references to “WaitForMultipleObjects” functions), all boiling down to a call to the GdipCreateSolidFill() API.  From what MSDN documents, this seems like a pretty innocuous function, having nothing to do with temporary files, only to do with UI.  I can understand this — by the time I’d looked at the GdiPlus stack, I believe the UI had “hung” (aka it was non-responsive).  So while this thread was also active, it’s almost impossible for it to be involved in this issue.

Then the only thing I know for sure is the temp file was growing due to some runaway operation, and the runaway operation (which was probably related to an attempt to format Comment text) at some point obtained a handle to the Word application object.

I’m guessing that the only way to get any closer to the root cause would be to dig into the memory dump.  And…bugger me, the dump I grabbed ended up with this as its STACK_TEXT (from !analyze -v):

0adaffc8 7c9507a8 00000005 00000004 00000001 ntdll!DbgBreakPoint
0adafff4 00000000 00000000 00000000 00000000 ntdll!DbgUiRemoteBreakin+0x2d

Guess that’s “the wall” for me.

Advertisements

5 thoughts on “Debugging a Word 2003 runaway thread…but not successfully

  1. Hey,Thanks for the detail on what you did to begin troubleshooting this issue. After years of working as an “engineer” in IT, I have decided to start learning what is really going on in the systems I’m using. While I can’t say that I feel much closer to resolving the same winword.exe issue on my machine, I did enjoy reading your thought process on troubleshooting. When troubleshooting a network issue, I often go back to binary to find the issue and it looks to me like you are doing the same by reading the process.Thanks for the post!

    Like

  2. I’m having a simialr problem. I tracked the tmp file, and in my case it had got to 27Mb over the course of hour!wdGetApplicationObject() looks similar to a call I used to use in VB to get a handle on Word for OLE using OLE to control if from a different application.

    Like

  3. I have a belly bug…can that be Debugged?
    Toe Jam did not work thus far. I might try a dump after breakfast if it will come out.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s