When will DevSecOps resemble DevOps?

https://www-forbes-com.cdn.ampproject.org/c/s/www.forbes.com/sites/jasonbloomberg/2017/11/20/mitigate-digital-transformation-cybersecurity-risk-with-devsecops/amp/

Another substance-free treatise on the glories of DevSecOps.

“Security is everyone’s job”, “everyone should care about security” and “we can’t just automate this job” seems to be the standard mantra, a decade on.

Which is entirely frustrating to those of us who are tired of security people pointing out the problems and then running as soon as there’s talk of the backbreaking labour of actually fixing the security issues, let alone making substantive system improvements that reduce their frequency in the future.

Hell, we even get a subheading that implies it’ll advance security goals in a CI/CD world: “The Role of Tooling in DevSecOps”. Except that there’s nothing more than a passing wave hello to Coverity (a decent static analysis vendor, but not the start nor the finish of the problem space) and more talk of people & process.

Where’s the leading thinkers on secure configuration of your containers? Where’s the automated injection of tools that can enforce good security IAM and correct for the bad?

I am very tired of chasing Lucy’s football:

lucy-football

I’m tired of going out to DevSecOps discussions at meetups and conferences and hearing nothing that sounds like they “get” DevOps.

DevOps works in service of the customers, developers and the business in helping to streamline, reduce the friction of release and make it possible to get small chances out as fast and frequently as possible.

I’ve asked at each of those discussions, “What tools and automation can you recommend that gets security integrated into the CI/CD chain?”

And I’ve heard a number of unsatisfying answers, from “security is everyone’s job and they should be considering it before their code gets committed” all the way through to “we can’t talk about tools until we get the culture right”. Which are all just tap-dancing dodges around the basic principle: the emperor has no clothes.

If DevSecOps is nothing more than “fobbing the job off on developers” and “we don’t recommend or implement any tools in the CI/CD chain”, then you have no business jumping on the DevOps bandwagon as if you’re actively participating in the process.

If you’re reliant merely on the humans (not the technology) to improve security, and further that you’re pushing the problem onto the people *least* expert in the problem space, how can you possibly expect to help the business *accelerate* their results?

Yes I get that DevOps is more than merely tools, but if you believe Gene Kim (as I’m willing to do), it’s about three principles for which tools are an essential component:

  1. Flow (reduce the friction of delivery) and systems thinking (not kicking the can down to some other poor soul)
  2. Amplify feedback loops (make it easy and obvious to learn from mistakes)
  3. Create a culture of learning from failure.

Now, which of those does your infosec approach support?

Hell, tell me I’m wrong and you’ve got a stack of tooling integrated into your DevOps pipeline. Tell me what kinds of tools/scripts/immutable infrastructure you’ve got in that stack. I will kiss your feet to find out what the rest of us are missing!

Edit: thoughts

  • Obviously I’m glossing over some basic tools everyone should be using: linters.  Not that your out-of-the-box linter is going to directly catch any significant security issues, no – but that if you don’t even have your code following good coding standards, how the hell will your senior developers have the attention and stamina to perform high-quality, rapid code reviews when they’re getting distracted by off-pattern code constructions?
  • Further, all decent linters will accept custom rules, disabled/info-only settings to existing rules – giving you the ability to converge on an accepted baseline that all developers can agree to follow, and then slowly expand the footprint of those rules as the obvious issues get taken care of in early rounds.
  • Oh, and I stumbled across the DevSecCon series, where there are likely a number of tantalizing tidbits

Edit: found one!

Here’s a CI-friendly tool: Peach API Security

  • Good news: built to integrate directly into the DevOps CI pipeline, testing the OWASP Top Ten against your API.
  • Bad news: I’d love to report something good about it, but the evaluation experience is frustratingly disjointed and incomplete.  I’m guessing they don’t have a Product Manager on the job, because there are a lot of missing pieces in the sales-evaluation-and-adoption pipeline:
    • Product Details are hosted in a PDF file (rather than online, as is customary today), linked as “How to Download” but titled “How to Purchase”
    • Most “hyperlinks” in the PDF are non-functional
    • Confusing user flow to get to additional info – “Learn More” next to “How to Download” leads to a Data Sheet, the footer includes a generic “Datasheets” link that leads to a jumbled mass over overly-whitespaced links to additional documents on everything from “competitive cheatsheets” to “(randomly-selected-)industry-specific discussion” to “list of available test modules”
    • Documents have no common look-and-feel, layout, topic flow or art/branding identity (almost as if they’re generated by individuals who have no central coordination)
    • There are no browseable/downloadable evaluation guides to explain how the product works, how to configure it, what commands to use to integrate it into the various CI pipelines, how to read the output, example scripts to parse and alert on the output – lacking this, I can’t gain confidence that this tool is ready for production usage
    • No running/interrogable sample by which to observe the live behaviour (e.g. an AWS instance running against a series of APIs, whose code is hosted in public GitHub repos)
  • I know the guys at Deja Vu are better than this – their security consulting services are awesome – so I’m mystified why Peach Tech seems the forgotten stepchild.

Edit: found another!

Neuvector is fielding a “continuous container security” commercial tool.  This article is what tipped me off about them, and it happens to mention a couple of non-commercial ideas for container security that are worth checking out as well:

Edit: and an open source tool!

Zed Attack Proxy (ZAProxy), coordinated by OWASP, and hosted on github.  Many automatable, scripted capabilities to search for security vulnerabilities in your web applications.

 

 

Advertisements

Hiring in the Kafka universe of BigCorp: a play in infinite acts

Ever wondered why it takes so long for a company to get around to deciding whether to hire you?

The typical hiring “process” as I’ve observed it from inside the belly of two beasts (Microsoft and Intel – though I gather this is typical of most large, and many small, companies):

  • “yeah, we’ve got two heads requested, has to get through Mid-Year Budget Adjustment Review Fuckup”
  • “update? Yeah, MYBARF is taking a little longer than usual, but I’m hearing we’re likely to get the heads, so I’ve started drafting the job req”
  • “new emergency project announced – I’ll be heads-down for a few weeks with my key engineer – BTW we lost one of he heads to another project, last one isn’t approved yet”
  • “yeah, MYBARF got approved last month but the open head is still under negotiation”
  • “OK the head is approved – I lost the draft req, could someone volunteer to write one up for me?”
  • “HR had some feedback on the req language”
  • “we posted the req”
  • “I’ll have time to review resumes from HR in a week”
  • “HR has no idea how to screen for this job so I had to reject the initial batch of resumes”
  • “OK, I’ll have time to phone screen starting next week”
  • “I haven’t seen any mind-blowing candidates yet so I’m talking to HR *again* about my expectations”
  • “Can you do a tech screen tomorrow morning between 7:30 and 8:15? That’s the only time one candidate has for us to talk…”

Blaming the end user, docket #257: “many consumers still untrained on privacy risks”

Yosemite Sam

I’m disappointed at the continued “blame the victim” framing these kinds of articles take – as if it’s a simple matter of changing the behaviour of hundreds of millions of consumers every day, it’s their own fault and no one else is culpable for nakedly exploiting this fact of human behaviour.  Makes my blood boil.

Let’s take it as a given that when things get so complex that you need to create and force training on masses of end users, you have failed to design a system with which the end users can reasonably succeed.

In the future, as in the past, when people say “so we’re going to build training for that” I will continue to slow down the conversation and ask “is there a way for us to refactor the system that does not require separate and egregious training?”

Study: Many Consumers Still Untrained On Privacy Risks:

Despite a high rate of concern about online threats, most consumers still do not pay much attention to their privacy settings in social media, and few have had any online security training according to a Harris Interactive survey of more than 2,000 adults sponsored by security vendor ESET… More than half of consumers have not read the most recent privacy policy for their social media accounts, the survey says. About 20% of consumers have never made any changes to the default privacy settings in their social media accounts. “This finding is worrying because of the very ‘open’ nature of most default social media settings, sometimes set by the social network operator to permit the widest possible use of your information,” ESET says in a blog about the study. “It is hard to think that everyone who leaves the default settings in place is aware of the implications.”

TED Talks: performance art, not mere compressed learning

A perfect example of what I mean – Shane Koyczan: "To This Day" … for the bullied and beautiful:

And another that still resonates with me, long after the fact – Jill Bolte Taylor’s stroke of insight:

I’ve recently encountered the predictable yet still surprising backlash against TED Talks – backlash in the form of long rants by people I respect, but whose resistance and cynicism I can’t fully understand. Since I’m usually at the forefront of cynical distancing from something so popular, this exacerbated my naive shock and dismay.

I’m a convert to the TED phenomenon, and wonder every day when I still haven’t gotten around to watching more of them – because every time I do sit down for one, it makes me feel more a part of a global, expansive human experience, and gives a booster shot to my hope that there’s far more to life than drudgery, suffering and isolation.

[In response to complaints that TED Talks are simplistic, reductionist, and a one-way "conversation" to an incredibly privileged audience] I’ve been pleasantly surprised by many talks, and genuinely affected by a few. Privilege and reductionism aside, they’re still 20 minutes more I spend learning about a subject or having my perspective widened than I’d otherwise spend.

Further, I find a deep pool of irony in complaining about about TED’s short attention on Facebook.

[In response to concerns that a one-way compressed monologue doesn’t suit some people’s learning styles, and that the one-way nature and a subtle "attitude" embedded in it creeps some people out] I can easily accept that it isn’t suitable to how your brain learns – I know how much of a learning failure I am through books, and how well I assimilate and integrate new ideas through the Meetup approach. Which tells me that for me, Meetups or other forums where we get to have lectures where questions are welcome, plus loosely related discussions around it, are my ideal learning model.

Close second is the unconference like Agile Open Northwest – where we get to hear lots of 0-day thoughts shared by people who want a very barrier-free, interactive discussion on subjects that are just-proposed-today and are low risk (since they’re selected through vote-with-your-feet) so high-value subjects abound for nearly everyone. No "selection" (aka pre-screening) committees so no groupthink filters.

But still none of this invalidates the 20-minute, polished-and-scrubbed summary of decades worth of work or life. If you can’t convey one good idea in 20 minutes, I sure ain’t giving you an hour or a book’s worth of attention. Must be why reading long-form books or journal articles seems so excruciating to me now – instead of one thesis, it seems to grant license to jam in several loosely related thoughts.

Maybe it’s the talks I’ve seen and remember – speakers from whom I detect a subtle nervousness, a little extra humanity – not the supra-polished talks that look like they’ve been given a million times and couldn’t provide less of a connection to their audience if they were delivered from within a gameshow soundproof booth.

a little humanity from the king

The plague of “smart refrigerators”

I think we’ve all by now heard of the mad, magical future in which your new refrigerator will have the intelligence to know when you’ve just run out of milk and will automatically order more for you. A perfect digital servant, that just happens to knew exactly which items in your fridge you need repeatedly, at a perfect frequency to match their consumption. But what about stuff I bought once and no longer want? What about the milk that went bad (even before the due date) and has to be poured out all at once? And what about all the commodities I keep on the shelf, and put in the fridge once I open them?

This so-called “smart fridge” is one of those nearly-generic, ubiquitous, almost brainless examples trotted out as a stand-in for for future tech, just as we see those stupid example apps show up on every new “extensible” piece of technology (phone, widgets framework, whatever) – the stocks, sports scores and weather apps. The apps that *no one* ever uses more than the first week of owning that tech (well, I’m sure there’s someone – like the dev – who must use them, but no one I know – and not like “I don’t know anyone who will admit to buying a Michael Jackson album while he was alive”).

Which reminds me of the foolish crapware that used to show up only on new PCs – but now ships with some Android phones and with all “smart TVs”. Ugh – I saw a report recently (https://www.npdgroupblog.com/internet-connected-tvs-are-used-to-watch-tv-and-thats-about-all/) that most smart TV users just watch live, streamed or pre recorded content on their TVs, and almost none use the “smart” apps (generally less than 10% of smart TVs). In my experience they’re a resource of last resort – like when everything else has stopped working you’ll try them, but dog help you if you try willingly – hopes dashed, spirit mashed, ego crashed.

Which also reminds me of a great blog article by Scott Hanselman (My car ships with crapware http://www.hanselman.com/blog/MyCarShipsWithCrapware.aspx) about the terrible interface to the in-dash entertainment system in his new Prius. I’ve got the same one, and I fell victim to the same wow factor when considering the purchase. Once I actually tried to *use* the onboard apps, however, I quickly gave up – too slow, too many clicks, too many unintuitive choices, too few usages that weren’t much more efficient on my smartphone.

I happen to agree with Hanselman – not just about my in-car screen, but the in-TV “smarts” and the soon-to-be-everywhere “smart” appliances. I’d much prefer (at this stage in the “smarts” development) that these lesser apps be removed entirely in favour of just giving me a fully-integrated big screen on which to mirror my already-quite-handy pocket-sized computer. I understand the need for these industries to try to find ways to achieve bigger margins on the sales of these well-established markets. I just believe that these are poorly-executed, lesser-than bolt-ons that add nothing to the primary experience of the device to which they’re attached, and which will be in a few short years a supreme waste of space and an embarrassing relic. I fully expect that I’ll be unable to use *any* of the onboard capabilities of the Prius Entertainment system in three years’ time, and will have to add an aftermarket device or just sell the car to some rube.

I’d personally love to rip and replace the smart interface on my TV with something that was receiving active updates for more than six months from the manufacturer, and which provided me actually-helpful and complementary capabilities I can use right from my TV – and which aren’t just easier and more intuitive on my phone. How’s about a TV guide wired right into my TV? Or something that told me how much TV I’ve watched for the past month or year, and a breakdown of what kinds of shows I’ve watched? (Not that I’d find that info indispensable, but at least it would relate directly and more tightly with the device from which it derived.) How’s about a remote upload capability (push only, no pull – no need to freak out the privacy dudes) for all that data – and more, like power consumption and device health statistics, so I could do something useful and more permanent with that data?

And as for the fridge: how’s about a sensor that tells me how “empty” the fridge is, giving me a clue I should go shopping soon? This could be based on how much power it’s taking to cool the contents each day – or how much the fridge weighs (compared to an average of the last six max weight measures). Or what if the fridge could actually pinpoint where that foul smell is coming from – and better, could give you a warning when the crisper is getting more “moist” (i.e. more “rotty”) than it should be.

That would be a smart device I would actually appreciate.

Installshield – great for developers, sucks ass for victims (aka everyone else)

Holy crap, what a safari this is turning into.  I’m trying to uninstall a piece of software (the Intel Debugger v9.1.x) that was apparently packaged with Installshield.  However, every time I tell Windows to uninstall it, it returns to me the following error:

image

Idiot move #1: look for help from InstallShield Corp

So I Google for this error and come up with a half-dozen links to various “support” articles from InstallShield and related on how to resolve this error.  It tells me to download various versions of their runtimes and engine installers, none of which make any improvement in the situation.

I went away for a while, came back again today and tried a whole different attack:

Forget the vendor, just debug it yourself (using Process Monitor)

  • Launch Process Monitor, filtering out all running processes except msiexec.exe
  • Near the end we finally see some activity that’s related to the problem:

MsiExec.exe    RegQueryKey    HKCR\CLSID\{8B1670C8-DC4A-4ED4-974B-81737A23826B}\LocalServer32

MsiExec.exe    RegQueryValue    HKCR\CLSID\{8B1670C8-DC4A-4ED4-974B-81737A23826B}\LocalServer32\(Default)    Data: C:\PROGRA~1\COMMON~1\INSTAL~1\Driver\8\INTEL3~1\IDriver.exe

  • Then there are four attempts to launch the IDriver.exe, all of which immediately halt
  • Lastly, there’s an update to the MSI log file which says this:
1: The InstallScript engine on this machine is older than the version required to run this setup.  If available, please install the latest version of ISScript.msi, or contact your support personnel for further assistance.
=== Logging stopped: 09/06/2008  14:02:26 ===

At least I know which file is “older than the version required”.

However, the next problem is figuring out how to get the ‘right one’ executed in its place:

Where Your Hero* Learns Just How Screwed Up InstallShield’s Model Really Is

* aka “just some dick on the Internet”

From what I can tell, Installshield only cares about one person: the dork who blindly builds the Installer package for their one little application.  Apparently, if you need to call on the Installshield components, don’t ever even try to discover whether they’re already installed on the target.  Instead, assume that they must *not* be installed (presumably because every developer on the planet has the privilege of being the first to get software installed on each PC where it’s being used), and always install a copy of some Installshield dependency on the end-user’s PC.  And then for good measure, make sure that there’s a hard-coded dependency on the version of the InstallShield bits that went with the installer.

They sure as hell don’t seem to care about the lowly end-user or IT administrator, who might have to actually *deal* with the nightmare of conflicting/overwriting/installed-to-every-conceivable-corner-of-the-filesystem versions of these hard-coded InstallShield dependencies.

Just for s**ts and giggles, try this at home:

  • fire up REGEDIT.EXE
  • Press [Ctrl]-F to bring up the Search dialog
  • Type Installshield and [Enter]
  • Click the [F3] button a few dozen (or hundred) times

A bit more digging on my own system:

Current version of IDriver.exe in the logged directory = 8.0.0.123

One article in the InstallShield/Macrovision/Acresso library confirms the noted location is where the IDriver.exe version 7 or 8 should be found.

Once more, I downloaded and installed the latest InstallScript 8 package (which turns out to be the 8.0.0.123 I already had installed), so I then decided to try downloading all the later versions and install them one by one as well.  I was hoping that the Registry setting that resolves to this particular IDriver.exe would be overridden (at least in the “Version Independent ProgID” or something similar) by a later install.  Here’s one set of settings that I figured were related:

  • CLSID: {8B1670C8-DC4A-4ED4-974B-81737A23826B}
  • (Default) value: InstallShield InstallDriver
  • AppID: {1BB3D82F-9803-4d29-B232-1F2F14E52A2E}
  • LocalServer32: C:\PROGRA~1\COMMON~1\INSTAL~1\Driver\8\INTEL3~1\IDriver.exe
  • ProgID: ISInstallDriver.InstallDriver.1
  • VersionIndependentProgID: ISInstallDriver.InstallDriver

Yep, after installing IScript9.msi, the CLSID under the entry HKCR\ISInstallDriver.InstallDriver changed to {B3EDE298-AE75-4A1C-AB7E-1B9229B77BBE}.  However, the uninstall “Fatal error” continued to crop up.  Apparently the fatal application’s uninstaller doesn’t chase the ProgIDs but some other reference instead.

Then by some wild fortune, I happened to stumble on a very obscure directory in which the “later version” of the InstallScript MSI installer (isn’t there some irony embedded in that?) was actually still cached.  WHY this wasn’t available from the vendor’s own web site, I’ll never know.  However, installing this version of IScript.msi did overwrite the ProgID once again, and the version of IDriver.exe installed in the target location was 8.1.0.293.

Somehow, finally, that did the trick.  Finally got that Installshield-driven crap off my system.  Trying to resist the impulse to wipe all traces of Installshield product off my system as well, and reminding myself that I could create this same hell for myself ten times over in so doing.

So, apparently all it takes is for some ancient application to overwrite the better version of an “Installation Engine”, and all hell breaks loose.  I’m beginning to see why the Installshield product line has been bought and sold more times than… well, I’m drawing a blank on a family-friendly comparison, so let’s just say this was not one of the more profitable software businesses out there.  And no freakin’ wonder.

ATOM, RSS & feeds – have YOU ever known which was the "right" one to choose??

What a friggin relief:

http://dev.live.com/blogs/devlive/archive/2008/02/27/213.aspx

“Microsoft is making a large investment in unifying our developer platform protocols for services on the open, standards-based Atom format (RFC 4287) and the Atom Publishing Protocol (RFC 5023).”

Finally I can try to stop worrying about which of the jillion “feed types” I should select to make sure that the feed subscriptions I’ve amassed are as “future-proof” as possible.  Good gravy, the number of times I’ve gone to subscribe to a certain feed, only to be faced with the choice among 2-6 different feeds (all apparently for the same set of articles) is just paralyzing:

  • Do I want the one with RSS in the final suffix?
  • Should I try to figure out which protocol is most popular/widest supported?
  • Do I need to try to figure out which one provides the most metadata with each downloaded article?
  • And which one might be sending the least data back to the author?  [This is ParanoidMike after all… wouldn’t want to disappoint my fans with a rare moment of rational thinking now]

I don’t normally like any one behemoth [aside: wasn’t that one of Godzilla’s opponent?] dictating to me a single format for anything, and I’m especially wary of any such edicts from Microsoft (having been privy to watching the sausage get made there for six years), but any time they make such an unequivocal commitment to an RFC standard and away from their “not built here” crap, I’m all in favour.

As David Hsing says: Best. Troll. Ever.

Holy crap that’s funny:

http://talkback.zdnet.com/5208-12355-0.html?forumID=1&threadID=31199&messageID=579806&start=43

Reproduced here for those (like me) too lazy to click through:

You are kidding aren’t you?  Are you saying that this linux can run on a computer without windows underneath it, at all?  As in, without a boot disk, without any drivers, and without any services?

That sounds preposterous to me.

If it were true (and I doubt it), then companies would be selling computers without a windows.  This clearly is not happening, so there must be some error in your calculations.  I hope you realise that windows is more than just Office?  It’s a whole system that runs the computer from start to finish, and that is a very difficult thing to achieve.  A lot of people don’t realise this.

Microsoft just spent $9 billion and many years to create Vista, so it does not sound reasonable that some new alternative could just snap into existence overnight like that.  It would take billions of dollars and a massive effort to achieve.  IBM tried, and spent a huge amount of money developing OS/2 but could never keep up with Windows.  Apple tried to create their own system for years, but finally gave up recently and moved to Intel and Microsoft.

It’s just not possible that a freeware like the Linux could be extended to the point where it runs the entire computer from start to finish, without using some of the more critical parts of windows.  Not possible.

I think you need to re-examine your assumptions. 

YES (for the sarcasm-impaired), this is a joke, and it’s NOT my writing.  Don’t bitch at me if you are rabidly anti-Windows — click on the link above and rant away to your heart’s content.

Debugging a Word 2003 runaway thread…but not successfully

I just experienced one of the usual “hangs” in Microsoft Word 2003 that happen pretty regularly when working on multiple, large documents for any significant length of time.  The WINWORD.EXE process is taking up 50% of my CPU (which as a dual-core processor, means that there’s a thread that’s somehow taking up 100% of the logical CPU for which it’s scheduled), and has been doing this for at least ten minutes now with no letup.

In my experience, these “runaway consumers of CPU cycles” just never quiesce — eventually I have to decide to kill WINWORD.EXE from Task Manager or Process Explorer, or else the offending process will consume that “CPU” from now until the end of time.

Maybe I was just bored today, ’cause rather than just kill the runaway process, I decided to see if I could dig a little deeper.  [I think Mark Russinovich has infected me with the idea that these are surmountable problems — though I wouldn’t dream of trying to make a favourable comparison between my haphazard hacking and Mark’s mad skillz.]

Process Explorer

Let’s have a look at a few screenshots, shall we?

image 
(Performance stats, in case that’s useful to anyone — though it doesn’t provide me any telling evidence)

image
(Listing of the threads currently instantiated in WINWORD.EXE including the main thread, which is the one causing all the problems)

image
(Stack contents for the WINWORD.EXE thread)

image
(Stack contents for GdiPlus.DLL thread, which was the only other thread with any activity under the “CSwitch Delta” heading)

Process Monitor

Once I decided to investigate, I fired up Process Monitor and limited it to WINWORD.EXE.  The activity logged is almost entirely like this:

image

Don’t strain your eyes too badly on this — I’ve included this just to note the incessant nature of the major activity here: a rapidly-repeating WriteFile operation on a single Temporary file (~WRS1954.tmp), interrupted once in a while by a smaller (Length of anywhere between 512 and 3072) ReadFile operation on the same file:

image

Interestingly, these ReadFile operations occur in an irregular but repeating pattern:

image

Also of note is the fact that this temporary file is constantly growing in size, and not just temporarily swelling the data stored within a pre-allocated file — I confirmed that by right-clicking on the Path in Process Monitor, chose “Jump to Location…” and simply Refreshed on the folder to observe the reported file Size was incrementing every time (over a span of 50 minutes, it grew by approx. 222 Kb, or 233657856 bytes).

If I look closer at the Handles for WINWORD.EXE, I notice that this is one of many Temporary files open by Word, which implies that the problem we’re experiencing is very specific to one type of unexpected activity (and not just affecting Word’s handling of Temporary files):
image
(Note: I intentionally “hashed” out the original filename, which is the last entry in the list pictured.)

One other piece of information: I tried to resize the Window in which the active document was being displayed.  Word appended “(Not Responding)” to its Title Bar, and that seems to have changed the behaviour profile of the WINWORD.EXE thread.  Since that point in time, Process Monitor did not record any further increase in the size of the ~WRS1954.tmp file, but recorded one additional ReadFile operation on the WINWORD.EXE file itself (Offset: 3998720, Length: 4096).  [WINWORD.EXE File version = 11.0.8169.0, Digital signature timestamp = May 31, 2007 12:38:03 PM]

Finally, I grabbed a full memory dump of the WINWORD.EXE process, using windbg.exe and the .dump /ma command.  I can’t say I know much about debugging a dump file, but I’ve got it on the off-chance that I ever find a good guide to debugging.

What Caused This?

Three circumstances I think contributed to this, though in my opinion none of them should lead to hung process (since I’ve done this more often without incident):

  1. I had opened a Word 2003 document directly from Outlook (it was attached to an email).
  2. The document had Track Changes enabled, and I’d already added Comments throughout the document.
  3. In the Comment I was just editing, it had scrolled off screen…
    image
    …and I had just attempted to apply formatting (I’d typed [Ctrl]-B and [Ctrl]-I rapidly,to bold and italicize) to a single word in the Track Changes panel below the document (the one that opens automatically when you keep typing in Comments that have already “scrolled off screen”).
     image
    (Note: I intentionally redacted the confidential text — but it sure ain’t artistic)

Caveat: While my experience with Word over the years has taught me that heavy use and abuse of the Comments feature leads to instability, I’m still miffed that I’d lose the recent batch of edits just because I’d foolishly tried to emphasize my point using basic formatting in a Comment.

So What Can We Conclude So Far?

I don’t know much about reading a stack trace, so this is all guesstimation on my part (plus a little intelligence gathered from a good Russinovich article).  The WINWORD stack indicates that Word has called ntkrnlpa.exe aka the Windows kernel.  It looks like it’s basically stalled (KiDispatchInterrupt) while creating a new thread (KiThreadStartup).  Looking lower in the stack, the first caller in WINWORD is labelled only “0x1a772b” — whatever that is, it’s beyond my skills to unearth the identity of that API.

The next one down in the stack, however, is wdGetApplicationObject().  There’s no information in MSDN that references this function, though a few pages on the ‘net do allude to it (mostly in the same kinds of searches I made).  The best info I could find was here, which I’m guessing is Word’s way of getting a handle to the overall Word “application object”.  However, without any further context, it’s very hard to imagine what is really going on here.

Turning to the GdiPlus stack, it looks like another kernel call that’s stalled (many similar references to “WaitForMultipleObjects” functions), all boiling down to a call to the GdipCreateSolidFill() API.  From what MSDN documents, this seems like a pretty innocuous function, having nothing to do with temporary files, only to do with UI.  I can understand this — by the time I’d looked at the GdiPlus stack, I believe the UI had “hung” (aka it was non-responsive).  So while this thread was also active, it’s almost impossible for it to be involved in this issue.

Then the only thing I know for sure is the temp file was growing due to some runaway operation, and the runaway operation (which was probably related to an attempt to format Comment text) at some point obtained a handle to the Word application object.

I’m guessing that the only way to get any closer to the root cause would be to dig into the memory dump.  And…bugger me, the dump I grabbed ended up with this as its STACK_TEXT (from !analyze -v):

0adaffc8 7c9507a8 00000005 00000004 00000001 ntdll!DbgBreakPoint
0adafff4 00000000 00000000 00000000 00000000 ntdll!DbgUiRemoteBreakin+0x2d

Guess that’s “the wall” for me.