What does it really mean to Prevent Buffer Overruns in Managed Code, Michael Howard?

One of the reasons I’m spending so much of my free time writing code (and neglecting my wife and dogs, much to their chagrin and my isolation) is that I’m trying to personalize the lessons of developing code, and developing secure code, that I preach as part of my day-to-day job.

I’ve been seeing a lot of references to “don’t trust user input”, and I’ve been trying to figure out what I’m supposed to do in managed code.  What I’m really after are some code samples or some prescriptive guidelines.

Of all the resources I know of on the subject, I suspect the best guidance I’ll find is in the book 19 Deadly Sins Of Software Security: Programming Flaws and How To Fix Them (Howard, LeBlanc, Viega).  I flipped through this a couple of months ago and while it seemed heavily weighted towards unmanaged code (C and C++), I seem to remember a reasonable amount of mention of managed code as well.

When I dug into the table of contents, there wasn’t any one chapter entitled “don’t trust user input”.  Instead there’s titles like “Sin 1: Buffer Overruns“, “Sin 2: Format String Problems“, “Sin 3: Integer Overflows“, “Sin 4: SQL Injection“, “Sin 5: Command Injection” and “Sin 14: Improper File Access“.  [I believe these are all the sins that relate to trusting user input, but I’m sure that’s hardly all the ways that trusted user input can be harmful to your code’s health!]

Sin 1: Buffer Overruns

So it looks like this is the most significant of all the Sins to consider when developing managed code.  Not only does it encapsulate the kind of thinking that should be applied to other Sins, but that it’s the most prevalent issue to expect in managed code and it applies to all types of managed code applications.

While I’ve understood for years what a buffer overrun means in general, I’ve never paid too much attention to thinking through exactly how to implement protections against buffer overruns.  What’s worse is, the guidance for managed code developers in this book isn’t exactly crystal-clear (at least, not to a relative novice like me):

C# enables you to perform without a net by declaring unsafe sections; however, while it provides easier interoperability with the underlying operating system and libraries written in C/C++, you can make the same mistakes you can in C/C++. If you primarily program in higher-level languages, the main action item for you is to continue to validate data passed to external libraries, or you may act as the conduit to their flaws.

So what does this mean to the managed code developer?  Am I reading this right, that we should only have to worry about calls to unmanaged code, and that all managed code functions are perfectly fine as-is?  Or is this also trying to say that any calls between assemblies, whether managed-managed code or managed-unmanaged code, should be equally guarded so that all passed buffers are checked?

Let’s assume for the moment that it’s the former, and that only when we’re calling into an unmanaged code (PInvoke) function do we need to worry about protecting against buffer overruns.  Should we assume that every single PInvoke needs to be protected against buffer overruns, no matter what?  Or should we focus instead on following external user inputs, tracing them through our code, and only put guard code in place at one or more of those chained calls, when that external input will actually intersect with a PInvoke function?

Put another way, does this advice mean we should focus on the “back end” (protecting every PInvoke), or should we focus on the “front end” (tracing external input to any PInvoke)?

I have no real appreciation for this space, and I can imagine good reasons for taking either approach.  However, I also don’t relish the thought of either approach.  I’d hate to have to try to trace every external input all the way through the twisty paths that it’ll often take — what a nightmare for a large codebase (what a grueling code review that’d be)!  On the other hand, it seems really inefficient to have to wrap every PInvoke in some form of guard code (or worse, wrap every call to the PInvoke – thus duplicating the extra code over and over, and still leaving yourself open to overlooking one or more critical calls).

And hey — if every PInvoke should always be wrapped in anti-overrun guard code, then shouldn’t the Microsoft employee who runs PInvoke.net be aware of that, and be ensuring that such guard code is included in every PInvoke signature that’s documented on that site?  Based on this reasoning, I’d have to believe that it’s not practical — or not even theoretically effective — to try to protect against buffer overruns in the PInvoke signatures.

Quick Analysis of the Rest of the “User Input” Sins

Sin 2: Format String Problems

It sounds like the only significant effect of this Sin on managed code is when reading in input from external files.  The recommended “guard code” is to try to be sure you’re reading in the file you want (and not some path– or filename–spoofed substitute).

Sin 3: Integer Overflows

It sounds like the only time this is a problem in managed code is when performing calculations inside unmanaged code.  If I’m reading this right, the recommended “guard code” would check that the integer values passed into the unmanaged code call are in fact integer values.

Sin 4: SQL Injection

I’m not touching any SQL databases or data access libraries, so this is irrelevant to my current investigations.  If it’s relevant for you, go read everything you can on the subject — it’s a doozy.

Sin 5: Command Injection

No .NET languages are mentioned in this chapter, but I would imagine that anytime a “shell execute” type command is instantiated, this vulnerability could be present.  In such cases, I would follow the same advice they give: “You can either validate everything you’re going to ship off to the external process, or you can just validate the parts that are input from untrusted sources. Either one is fine, as long as you’re thorough about it.”

Sin 14: Improper File Access

It sounds like there’s no easy “rules” to implement as guard code for this class of flaw, but rather to be hyper-vigilant anytime managed code calls System.IO.File or StreamReader methods.

Note to self: review these VSTO articles

[aside: I have to remember to review these articles for any tricks that’ll help me troubleshoot/improve the VBA-to-VSTO conversion I’m doing for Word2MediaWiki++…]

Migrating a VBA Solution to a Visual Studio Tools for Office Add-In

Migrating Word VBA Solutions to Visual Studio Tools for Office

Convert VBA Code to Visual Basic When Migrating to Visual Studio 2005 Tools for Office

John R. Durant’s Consolidated List of Word 2003 Developer Resources

…and as a catch-all:

VSTO Forum: Non-VSTO Question/Issue Resources

Just one of the many reasons why Vista pisses me off…

I’ve spent the better part of three nights a week, for at least a month, trying to figure out how to reinstall my Linksys WUSB54G USB Network Adapter.  I’d bought this nice little device little while ago, and I was foolish (!?!) enough to think that I could disconnect it and plug it into any old USB port on my Vista PC, and have it work again.  [After this many years of working with USB devices in this manner, what was I thinking ?!?]

Instead, I found out when I plugged it back in that its attempts to “reinstall the driver” (during creation of the “new” device — oops, I guess plugging it into a different USB port was NOT to Vista’s liking) were being stymied by one of the most impenetrable errors I’ve ever encountered: ERROR_DUPLICATE_SERVICE_NAME.  Oh sure, you think this’d be an easy one to resolve eh?  Sure – just try to find the duplicated name anywhere in the Services hive of the Registry.  Nothing with “Linksys” in the name, and simply deleting anything with “Linksys” or “WUSB54G” in any of the setting, value or data didn’t cut it.  Vista still bitched about the duplicate name.

The error has plenty of references online (e.g. peruse here or here), but no one seemed to have any decent solutions on resolving this for any of the Linksys network devices that were at all similar to the one I have.  Plenty of speculation, just no good results.

Yes, I tried KB 823771, I’ve tried crawling through the SETUPAPI.LOG file, and I’ve tried a number of other brick walls to bang my head against.  The closest I got with the SETUPAPI.LOG was to look for references to “xxxxx” (can’t recall what that said exactly anymore), as in:

#E279 Add Service: Failed to create service “xxxxxx”. Error 1078: The name is already in use as either a service name or a service display name.
#E033 Error 1078: The name is already in use as either a service name or a service display name.
#E275 Error while installing services. Error 1078: The name is already in use as either a service name or a service display name.
#E122 Device install failed. Error 1078: The name is already in use as either a service name or a service display name.
#E154 Class installer failed. Error 1078: The name is already in use as either a service name or a service display name.
#I060 Set selected driver.

Aside: Why I Hate Vista

I’m having a bitch of a time trying to get Vista to preserve a network connection through its Sleep & Resume states.  I know that part of it is the fact that the networking hardware vendors haven’t written solid, stable drivers for Vista, but considering how widespread this issue is (even to this day — what, almost a year since release?), it’s really making me more frustrated with Vista [or perhaps it’s really I’m just pissed off at myself for having bought into the hype around Vista, when all it’s been for me since bringing it home has been needless hardware replacement and constant crashes, freezes, and troubleshooting].

This is the third network device I’ve purchased for my Vista box, and the third one that has had driver issues.  The first one just didn’t have a Vista driver, and the claimed “should be compatible” XP driver just gave Vista too many bluescreens.  The second one had a Vista driver and really good reviews on newegg.com, but the device would lose its driver as soon as Vista went to Sleep (and then resumed), and wouldn’t reload until I rebooted the box.  I’m not kidding — I spent a month trying to get that one to work like it should’ve.

I’ve been a Windows bigot for most of my adult life, and I even spent six years working for Microsoft, every day spent trying to make sure that Windows would work reliably and securely for my customers.  If *I* have this much trouble with Vista, my sympathies to those of you who’ve been trying to get by on just being a *part*-time Windows geek.  [And my sarcasm should be apparent, as I am firmly of the belief that *no* one should have to learn the ins-and-outs of a computer, just to be able to operate it.  If you *want* to geek out, by all means c’mon aboard.  But if you have *other* interests, then the device should be your servant — not the other freakin’ way around.]

Resolution (?)

What did I finally do that did (or seems to have done) the trick?

I finally went through the Registry and deleted any key that in any way shape or form referred to “USB\VID_13B1”.  The HARDWAREID for the Linksys WUSB54G USB Network Adapter is USB\VID_13B1&PID_000D (or some derivative thereof), and while this was never mentioned as the source of the error in any of the logs I crawled through, it finally seemed to me to be the most likely commonality among all the “duplicate names” that must’ve been detected by Vista during the attempted install of the device.  I only found a few such entries, but obviously they were the underlying showstopper for re-introduction of this wireless device into my setup.


Porting Word2MediaWikiPlus to VB.NET: Part 14 (Mysteries Abound)

[Previous articles in this series: Prologue, Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9, Part 10, Part 11 (The Return), Part 12 (Initialization continued), Part 13 (VBA Oddities).]

Mysterious Character: Vertical Tab (VT) — Do These Still Show Up in Word Documents?

In working through the code in MediaWikiConvert_Lists(), I ran across a block of code that purports to “replace manual page breaks”, and is using the Chr(11) construct to do so.  I must’ve been feeling extra-curious ’cause I went digging into what this means, and the harder I looked, the more puzzled I became.

According to ASCIITables.com, the character represented by decimal “11” is the so-called “vertical tab”.  I’ve never heard of this before (but then, there’s a whole host of ASCII & Unicode characters I’ve never paid attention to before), so I had to check with a half-dozen other references on the ‘net before I was sufficiently convinced that this wasn’t some “off-by-one” problem where the VBA coders were intending to look for Chr(10) (aka “line feed”) or Chr(12) (aka “form feed”).

On the assumption that we’re really and truly looking for “vertical tab”, I had to do some deep digging to figure out what this might actually represent in a Word document.  There’s the obligatory Wikipedia entry, which only said that “The vertical tab is  but is not allowed in SGML (including HTML) or XML 1.0.”.  Then I found this amusing reference to one of the Perl RFCs, which quotes Russ Allbery to say “The last time I used a vertical tab intentionally and for some productive purpose was about 1984.”.  [Sometimes these quotes get better with age…]

OK, so if the vertical tab is so undesirable and irrelevant, what could our VBA predecessors be thinking?  What is the intended purpose of looking for an ASCII character that is so unappreciated?

Mysterious Code Fragment: “If 1 = 2” – WTF?

I started to notice these odd little appendages growing out of some of the newer code in the VBA macro.  At first I figured there must be some special property of VBA that makes “If 1=2” a valid statement under some circumstances, and I just had to ferret out what that was.

Instead, the more I looked at it, the more puzzled I became.  What the hell could this possibly mean?  Under what circumstances would *any* logical programming language ever treat “If 1 = 2” as anything but a comparison of two absolute numbers, that will ALWAYS evaluate to False?

Eventually I had to find out what greater minds that mine thought about this, and so off to Google I go.  As you might expect, there’s not much direct evidence of any programming practices that include adding this “If 1 = 2” statement.  In fact, though it appears in the odd piece of code here and there, it’s surprisingly infrequent.  However, I finally ran across what I take to be the best lesson on what this really means (even if I had to unearth it through the infamous “Google cache”):

>>>Anyone know how to comment out a whole section in VBA rather than just
>>>line by line with a ” ‘ “?
>>If the code is acceptable (won’t break because some control doesn’t
>>exist, etc), I sometimes to
If 1 = 2 then
>> ….existing code
>> End If
>>The code will never fire until the day 1 = 2.
> Thanks, think Id prefer the first option. The second option might
> confuse any programmers that try and read my code.

Now that’s the understatement of the year.

So as far as I’m concerned, I’m going to go back and comment out any and all instances where I find this statement, as it tells me the original programmer didn’t want this code to fire, and was thinking of coming back to it someday after their last check-in.

Mysterious Approach: Localization via Macro?  No way.

There are a few routines that attempt to implement localization at runtime.  While this makes sense for VBA, this makes little if any sense for the use of VB.NET.  Any English-only strings can be substituted in the corresponding Resources file that will accompany this code.

Thus, the MW_LanguageTexts() routine will be skipped, since it had little if any effect anyway.

Mysterious Exception: “add-in could not be found or could not be loaded”

I’ve been struggling for a few days to try to actually run this add-in, and after finding out why, I can say with confidence that there was no good troubleshooting guide for this.

Here’s the setup:

  • I could Build the add-in just fine — no build-time errors, only two compiler warnings (about unused variables).
  • However, when I tried to either (a) Debug the project from within Visual Studio, or (b) add the add-in manually to Word, I was completely stymied.
  • When I started the Debug sequence (F5) from Visual Studio, it would launch Word 2003, which created all its default menus and toolbars, and then threw this error dialog:
    Office document customization is not available - An add-in could not be found or could not be loaded.
  • The details of this exception read:
  • Could not create an instance of startup object Word2MediaWiki__.ThisAddIn in assembly Word2MediaWikiPlusPlus, Version=, Culture=neutral, PublicKeyToken=1a75eafd9e81be84.

    ************** Exception Text **************
    Microsoft.VisualStudio.Tools.Applications.Runtime.CannotCreateStartupObjectException: Could not create an instance of startup object Word2MediaWiki__.ThisAddIn in assembly Word2MediaWikiPlusPlus, Version=, Culture=neutral, PublicKeyToken=1a75eafd9e81be84. —> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. —> System.NullReferenceException: Object reference not set to an instance of an object.
       at Word2MediaWiki__.Word2MediaWikiPlusPlus.Convert..ctor() in C:\VS2005 Projects\Word2MediaWiki++\Word2MediaWiki++\Convert.vb:line 44
       at Word2MediaWiki__.ThisAddIn..ctor(IRuntimeServiceProvider RuntimeCallback) in C:\VS2005 Projects\Word2MediaWiki++\Word2MediaWiki++\ThisAddIn.vb:line 29
       — End of inner exception stack trace —

  • If I tried to load the add-in from within Word (using the Tools > COM Add-ins… menu — which you can add with these instructions), Word would only tell me:
  • Load Behavior: Not loaded. A runtime error occurred during the loading of the COM Add-in.

    I won’t even bore you with the details of all the stuff I tried to do to debug this issue.   It turned out that I was instantiating my Application object too early in the code (at least, the way I’d constructed it).

    Broken Code

    ThisAddin.vb (relevant chunk)

    Imports Office = Microsoft.Office.Core
    Imports Word2MediaWiki__.Word2MediaWikiPlusPlus.Convert
    Public Class ThisAddIn
    #Region " Variables "
        Private W2MWPPBar As Office.CommandBar
        WithEvents uiConvert As Office.CommandBarButton
        WithEvents uiUpload As Office.CommandBarButton
        WithEvents uiConfig As Office.CommandBarButton
        Dim DocumentConversion As Word2MediaWikiPlusPlus.Convert = New Word2MediaWikiPlusPlus.Convert ' Line 29
    #End Region

    Convert.vb (relevant chunk)

    Imports Word = Microsoft.Office.Interop.Word
    Namespace Word2MediaWikiPlusPlus
    Public Class Convert
    #Region "Variables"
            Dim App As Word.Application = Globals.ThisAddIn.Application 'PROBLEM - Line 44
            Dim Doc As Word.Document = App.ActiveDocument 'PROBLEM
    #End Region
    #Region "Public Subs"
            Public Sub InitializeActiveDocument()
                If Doc Is Nothing Then
                    Exit Sub
                End If
            End Sub

    #End Region

    #Region “Public Subs”

    Fixed Code

    Convert.vb (relevant chunk)

    Imports Word = Microsoft.Office.Interop.Word
    Namespace Word2MediaWikiPlusPlus
    Public Class Convert
    #Region "Variables"
            Dim App As Word.Application 'FIXED 
            Dim Doc As Word.Document 'FIXED 
    #End Region
    #Region "Public Subs"
            Public Sub InitializeActiveDocument()
                App = Globals.ThisAddIn.Application 'NEW
                Doc = App.ActiveDocument 'NEW
                If Doc Is Nothing Then
                    Exit Sub
                End If
            End Sub
    #End Region

    What I Think Went Wrong

    As much as I understand of this, it seems like when the ThisAddIn class tries to create a new instance of the Convert class as a DocumentConversion object, the ThisAddIn object hasn’t been instantiated yet, so the reference in the Convert class to Globals.ThisAddIn.Application can’t be resolved (how can you get the ThisAddin.Application object if its parent object — ThisAddIn — doesn’t exist yet?) causes the NullReferenceException that is the heart of the problem.

    By pulling out that instantiation code from the App variable declaration, and delaying it instead to one of the Convert class’s Subs, there was no need for the managed code to “chase its tail” — trying to resolve an object reference back through the calling code, which hadn’t been instantiated yet.

    Y’know, I’m sure I read somewhere over the last year that combining the declaration with the instantiation of a variable is bound to lead to subtle debugging issues, but man.  Losing three days to this?  What a disaster.

    Lesson for the day: It never pays to take shortcuts.

    Another VSTO app idea? Man, I can’t keep up!

    I’m an avid user of Attensa for Outlook, a free Outlook add-in for aggregating RSS feeds as folders of “messages” in Outlook.  I like it because it (a) allows me to search my feeds quickly via Windows Desktop Search, and (b) lets me read my feeds whether I’m connected to the ‘net or not.

    However, there isn’t currently a free way to read my feeds via a web browser (e.g. from my new iPhone – hee hee!).  Well, I should say I can read my feeds via Google Reader, but my read/unread status doesn’t get sync’ed from Attensa to Google or back.  That means if I bravely skim through a bunch of articles in one place, I’ll likely have to wade through them (or get distracted by them) again in the other.

    I had a brainwave today (stand back, that could be contagious) about how to add functionality to be able to sync back & forth, and I think I’ve just dreamt up yet another coding project for myself:


    I have a pretty reasonable idea how to write managed C# or VB.NET that can integrate with Office via the Visual Studio Tools for Office model.  I’m not unfamiliar with web services, or with the basics of a .NET-based HTTP client [having just wasted a weekend authoring a very rudimentary web site parser].  I am bright enough to imagine that the Attensa add-in exposes a more abstract approach to addressing feeds & articles than just crawling the raw PST file, enumerating folders and addressing message objects directly.

    Now what I’d need to know is: is there an Attensa SDK and/or API which I could leverage in an Outlook application add-in using VSTO?  Would there be any advantage to using that abstraction layer, as opposed to just enumerating the PST folders and messages directly?  If the Attensa team only exposed an unmanaged API, would I be creating a performance nightmare to code through that (with all the PInvoke‘ing that is required) rather than just take my chances with the native Outlook object model?

    I can even imagine that the Attensa client might provide me a way of finding the translation between “articles from feed ‘x'” and “messages in folder ‘y'”, that relied on Attensa’s internal database, and then I could grind through the Outlook folders themselves.  That’d be a damn sight easier than trying to match up (a) feeds from the Google Reader API (article, wiki) to the folders as they’re named in the PST file, and (b) articles from the Google Reader API to the messages stored in the PST file.  It’d sure help if there was an indexed search capability in (a) the Google Reader API and (b) the Outlook PST object model.

    Oh, it’s fun to imagine all the ways I could make my life easier…after six months of hard dev work to get there.  Madman I am.

    Porting Word2MediaWikiPlus to VB.NET: Part 13 (VBA Oddities)

    [Previous articles in this series: Prologue, Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9, Part 10, Part 11 (The Return), Part 12 (Initialization continued).]

    How to convert the VBA String() Function?

    There’s a more-complicated-than-it-probably-needs-to-be subroutine in the Word2MediaWikiPlus codebase — called MW_SurroundHeader() — that seems to only be there to cleanup and reformat text in a Word document that has one of the Headings styles.  It uses a function from VBA called simply String(), which is one of the first cases of a VBA function for which I cannot find an equivalent in VB.NET.

    It turns out I found out what I needed from an oreilly.com article, and after running into a few brick walls in looking for a reference to this in MSDN, I started a more intelligent search.  I kept coming back to references to the String Data Type, so I next looked at the “Strings in Visual Basic” topic that was referenced by “For more information on string manipulation…”.  From there the next most logical leap was to “Building Strings in Visual Basic“, which led to “How to: Create Strings Using a StringBuilder in Visual Basic“.

    Once there, I figured that since this was so helpful to me, I’d like to save someone the trouble next time so I added a little of that “Community Content” sauce that I myself appreciate so much.

    Converting the Selection Object from VBA?

    The MW_FontFormat() subroutine also uses a no-longer-supported VBA-ism, the Selection object.  This isn’t all that well documented online either — or at least, I wasn’t able to find anything useful online to help figure out how to translate this into VB.NET.  The best I could find was a mention that the Range object in VB shares some common methods & properties with the Selection object in VBA.

    However, I happened to have a copy of an old book called the Microsoft Office XP Developer’s Guide, which was surprisingly results-oriented for an MSPress book.  Pages 176-177 actually discuss “The Selection Object vs. the Range Object”, in which I am told that the Range object is actually superior to the Selection object, and should always be favoured wherever possible.

    I’m not feeling up to the subtleties of Selection vs. Range right now, so I’ll leave this for another time.

    Converting the Font Colour to HTML-compatible values?

    This is another interesting puzzler… It seems that MediaWikiConvert_FontColors() calls RGB2HTML(), which calls OleConvertColor(), which calls OleTranslateColor(), which is a p/invoke to OLEAUT32.DLL.  [Man, this is starting to read like a book of the Old Testament…]

    I have a really strong gut instinct that there’s a managed code equivalent to this that will make the intended conversion in one step, and I intend to find it.  There’s no good reason at this point to (a) have this many calls going on the stack, just to get access to a “simple” math function, or (b) to preserve an unmanaged call just because it’s been used all the way up to now.

    I can think of at least three ways to try to find the managed class I’m after: search on OleTranslateColor, search on “RGB & HTML”, or start browsing books on managed web development.

    According to this “Format Color for HTML” article, the call to OleTranslateColor is only necessary in cases where you’re using “system color constants” or “palette indices”.  Since we’re getting very predictable input here that doesn’t appear to be using either of these two alternatives, right away we should be able to eliminate the unmanaged code.

    That is, if I’m reading this right, then I should just be able to remove OleConvertColor() from the initial call in RGB2HTML() and leave the first line of code as

    nRGBHex = Right("000000" & Hex(rgbColor), 6)

    However, upon double-checking, it seems that other code blocks on the VBA macro are passing in some of the Word.WdColor enumeration constants — which I assume are equivalent to “system color constants”.

    Rather than have the RGB2HTML() routine always thunk down to unmanaged code, it’d be smarter if we checked whether the color value of interest is a member of the Word.WdColor enumeration.  But do the routines that generate the input parameter to RGB2HTML() generate either Long or WdColor values?  Or alternatively, would the code implicitly convert from WdColor to Long as the RGB2HTML() routine initialized?  I didn’t notice any overloaded instances of RGB2HTML() that took the input parameter as a WdColor value, so I have to assume that no matter what goes on outside this routine, all operations inside RGB2HTML() will only operate on colors of type Long.

    If that assumption is correct, then we should be able to safely ignore the possibility that the input parameter may start out as a WdColor datatype, and that means we can safely eliminate the OleConvertColor() and OleTranslateColor() routines.  [For the moment, having already had to dig them back up once, I’ll just comment them out and leave myself a note to delete them once I’ve had time to test these colour conversions and confirm this assumption is true.]

    Colours in VBA vs. Colours in .NET

    A more interesting question, however, is whether we’re losing colour fidelity in the conversions being performed here.  According to VSTO For Mere Mortals, Chapter 4, “In VBA, colors are of type Long, and there are eight constants that can be used… In Visual Studio 2005, colors are of type Color, and there are more than 100 choices”.

    Is it possible that the calls being used to derive the colours from the Active document are limited to the VBA colour constants, and that I should be looking to switch to other calls that return the .NET Color constants?  I’ll just add this as another Task to the CodePlex project list, and deal with it later — it seems to me like this is hardly the biggest problem facing this Addin at the moment.

    Porting Word2MediaWikiPlus to VB.NET: Part 12 (initialization continued…)

    [Previous articles in this series: Prologue, Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9, Part 10, Part 11.]


    Much of this function seems to repeat the actions taken in Word2MediaWikiPlus(), so it’s a bit weird to see it done here as well (since this function is called explicitly by the other).  While some of it can be immediately discarded, other bits have to be examined more closely – mostly because they’re poorly documented (at least at the point from which they’re being called):

    • Again we have an ImagePath enumeration and/or creation
      • There’s an interesting new function I haven’t seen before: IIf(x,y,z)
        • VBA For Dummies tells me it does “test for ‘x’; if true, do ‘y’; if false, do ‘z'”.  Fairly tidy little function there.
      • Looking deeper into what’s going on here, the macro is assigning the ImagePath setting to a folder named “wiki” under the user’s My Pictures folder
      • This doesn’t make a lot of sense for a folder of temporary files that are deleted at the end of the session (or before the beginning of the next)
      • Therefore I’m going to make two changes:
        • this folder will be created as a subfolder of the user’s %TEMP% location
        • this folder will not only be emptied at the beginning of a session, but (as a good citizen of the computer) it will also empty its contents once it has completed a conversion
    • Again we have an EditorPath enumeration
      • it appears that the only path being set is the Microsoft Photo Editor (which we’ve previously confirmed is no longer available)
      • Is there any way to actually perform the image manipulation to which so much code has been devoted?

    The more I look at this image extraction code, the more complicated it gets.  At this point I’ve pretty much determined that, for all the effort it’ll cost to implement these image features, it’s just not worth the trouble in v1.  I’ll continue to add TODO: comments to the VSTO add-in to show where the image code will eventually go, but I’m not going to do any further work to understand the image code until the rest of the Add-in is working.

    Finally, there are the control characters that are being assigned (^l, ^m, ^p, ^s).  They’re not documented in the code, and I’m having a hard time finding any documentation that discusses the use of these control characters.  It doesn’t help that Google and MSDN Search don’t seem to allow you to search on “^p” — it seems they treat this as either “p” or “<p".

    I believe I could treat these as global constants in the Convert class, but what isn’t clear is whether these control characters are:

    1. special substitutions in Word, and will get converted to the native Word paragraph/new line/blank/page break code (in which case I should just use the native VSTO/VBA enumerations), or
    2. treated by Word as ASCII text and sent to the Wiki server, which converts them to HTML when displaying the resulting article (in which case I should probably make sure there isn’t a better way to represent these in MediaWiki format).

    Aha!  After trying over & over, I finally came up with a search in Microsoft’s Knowledge Base that gave me an article talking about the “^p” (which it calls a “paragraph mark”):

    WD2000: Text Converted to One-Row Table (Paragraph Marks Ignored)

    These appear to be ancient character sequences (as early as Word 1.0), so I’m going to first try using the native Word enumerations for these character strings wherever possible.  If I have to go back to using these character sequences, then I’ll drop them back in to the Convert class as Constants.

    Aside: today I stumbled on an invaluable reference: the Microsoft Word Visual Basic Reference online.  This implies it’s an authoritative reference for all VBA available in Microsoft Word.  Should prove useful.


    From what I can tell by a single read through this routine’s code, this all appears to affect the ActiveDocument.  That means all this code can go into the InitializeActiveDocument() subroutine (which I’ve conveniently already defined).

    • MW_SetOptions_2003() is just caching the Application.Options.SmartParaSelection value and then returning it once conversion is complete.  This can be handled as with the other cached settings.
    • I don’t understand this code fragment at all:
          'Now, if we might have some problems, if we are in a table
          If Selection.Information(wdWithInTable) Then Selection.SplitTable

    • If a variable like convertPageHeaders was always False (as I can’t find anything that sets it True), then why would such a huge block of code be hidden inside this code block:
      If GetReg("convertPageHeaders")... EndIf
      It's just hard to guess what the programmer's intentions were with a never (rarely?) called piece of code.
    • Then there’s a lot of boring code conversion, where I’m just giving methods and variables more meaningful names, adding appropriate prefixes to all the Word enums being used, and just commenting the crap out of things where I don’t have a clue how to fix some weird or cryptic code routines.

    Reference to a non-shared member requires an object reference

    The most interesting thing I’ve had to research so far was the problem I created for myself by implementing the code into two classes (so far).  I finally got around to calling the Convert class’ public methods in the ThisAddin class’ uiConvert_Click() handler.  As the naive little programmer that I am, I of course first tried to just set the Imports statement at the top of the ThisAddin class, and then call the public methods “naked” like so:


    Of course that didn’t work, but I didn’t know why at the time.  Instead, I scratched my head for quite a while over how to handle the compiler warning “Error 232: Reference to a non-shared member requires an object reference“.

    I’ve run up against this before, and I’m pretty sure I was lured at the time down the path to hell: I started adding Shared declarations all over the place.  It’s really tempting — when the IDE implies you should try an easy fix like this, it’s hard to know why this should be bad.  “Didn’t the IDE’s developers know what they were doing?”  “Why would they lead morons like me astray?

    Unfortunately, this is akin to tugging at that first loose strand of a nice wool sweater: pretty soon I’d added so many additional Shared declarations that I’m sure the code was wide open to all sorts of future, stealthy issues I have no idea about.

    This time around, once I saw that one Shared begat yet another implied request to add another Shared declaration, I stopped and did some further digging around.  While I wasn’t able to find any articles or MSDN docs that really spelled it out for me, I think I figured out a worthy approach on my own.  [This forum thread was as good as any.]

    I’ve published the following as Community Content to the “Error 232” page on MSDN.

    Avoid adding the Shared keyword

    While this error message tempts the inexperienced programmer with the “easy” solution of just adding the Shared keyword to the requested Method, I advise strongly against it.  Unfortunately there’s little documentation or advice out there aimed at the programmers like me who don’t really understand the problems they’ve created, nor the trade-offs in the possible solutions being (cryptically) recommended.  Hopefully this’ll help out other folks like myself avoid the really nasty mistake I’ve already made a few times.

    The trouble with adding the Shared keyword to a second Class’ Method is that it rarely stops there.  Once you’ve shared a method, whether Public, Private or otherwise, many of that method’s members will also need adjustments.  At least in my experience, the first Shared keyword will work as well as cutting off the Hydra’s head: it usually leads to one or more instances of the error “Error 227: Cannot refer to an instance member of a class from within a shared method or shared member initializer without an explicit instance of the class.”  The first time I tried to kill this Hydra, I had tried to rewrite a bunch of code, and ended up with a rat’s nest of Shared keywords scattered everywhere.

    A Better Approach than Adding the Shared Keyword

    As the advice on this page (cryptically) recommends, try creating an instance of the class.  The big fear that initially scared me off was that I’d end up either (a) unknowingly creating and destroying tons of unnecessary instances of that Class as objects, or (b) not understanding when the object I’d created fell out of scope (and would creep up on me with unpredictable garbage collection-derived errors).

    What I did to alleviate this issue was to declare a “class-level” variable in the calling class of the type of the class being called, and then use that variable as the root of all subsequent uses of the called class’ methods.

    This example should illustrate:

    Public Class BusinessLogic   ' This is the "called" class
        Public Sub PerformAction()
        End Sub
        Private Sub Action()
        End Sub
    End Class
    Public Class UserInterface   ' This is the "calling" class
      Imports BusinessLogic  ' Doesn't help with Error 232, and may not be necessary at all
        Dim documentLogic As New BusinessLogic ' class-level variable 
        Private Sub uiButton_Click(ByVal Ctrl As Microsoft.Office.Core.CommandBarButton, ByRef CancelDefault As Boolean) Handles uiButton.Click
            PerformAction()  ' Causes Error 232
            documentLogic.PerformAction() ' This call is OK
        End Sub
    End Class

    Y’know, sometimes I’m just documenting this stuff for myself, since I know that in a few weeks’ time I’ll have completely forgotten the solution and the logic behind it.  The rest of you happen to be benefiting from my lack of memory, and I wish I could say I was being completely selfless, but I’m getting too old to be lying to folks I never even met. 🙂

    Five Ways to Use Visual Studio to Avoid Secure Coding Mistakes

    I was talking with a colleague recently, and we got on the subject of static analysis and why we all have to suffer with the problem of first making the mistakes in code, and then fixing them later.  She challenged me to come up with some ways that we could avoid the mistakes in the first place, and here’s what I told her:

    1. IntelliSense — the Visual Studio IDE is pretty smart about providing as-you-type hints and recommendations on all sorts of common coding flaws (or at least, it catches me on a lot of the mistakes that I frequently make), and they’re enabled out of the box (at least for Visual Basic.NET — I can’t recall if that’s true for C# as well).  [But I wonder why IntelliSense doesn’t handle some of the basic code maintenance?]
    2. Code snippets — Visual Studio has a very handy feature that allows you to browse a self-describing tree of small chunks of code, that are meant to accomplish very specific purposes.  These snippets save lots of time on repetitive or rarely-used routines, and reduce the likelihood of introducing errors in similar hand-coded blocks of code.
    3. PInvoke.net — if you ever need to P/Invoke to Win32 APIs (aka unmanaged code), this free Visual Studio add-on gives you as definitive a library as exists of recommended code constructs for doing this right.
    4. Code Analysis (cf. FxCop) — this is a bit of a cheat, as these technologies at first are simply about scanning your code (MSIL in fact) to identify flaws in your code (including a wide array of security-related flaws).  However, with the very practical tips they provide on how to resolve the coding flaw, this quickly becomes a teaching tool to reinforce better coding behaviours so you (and I) can avoid making those mistakes again in the future.
    5. Community resources — F1 is truly this coder’s best friend.  Banging on the F1 key in Visual Studio brings up a multi-tabbed search UI that gives you access not only to local and online versions of MSDN Library, but also to two collections that I personally rely on heavily: the CodeZone community (a group of MS-friendly code-junkie web sites with articles, samples and discussions) and the MSDN Forums (Microsoft’s dazzling array of online Forums for discussing every possible aspect of developing for the Microsoft platform).  If there’s one complaint I have about the MSDN Forums, it’s that there so freakin’ many of them, it’s very easy to end up posting your question to the wrong Forum, only to have the right one pointed out to you later (sometimes in very curt, exasperated, “why do these morons keep showing up?” form).

    However, if like me you’re not satisfied with just the default capabilities of Visual Studio, then try out some of these add-ons to enhance your productivity:

    There are a large number of third-party code snippets available from http://www.gotcodesnippets.net as well (though the quality of these is totally unverified, and should be approached with caution).


    • Code Analysis (FxCop):
      • JSL FxCop — a coding tool that eases the difficulty of developing custom rules, as well as a growing library of additional rules that weren’t shipped by Microsoft.
      • Detecting and Correcting Managed Code Defects — MSDN Team System walkthrough articles for the Code Analysis features of Visual Studio.

    I’m also working on trying to figure out how to add a set of custom sites to the Community search selections (e.g. to add various internal Intel web sites as targets for search).

    Memory leaks, GDI Objects and Desktop Heap – Windows registry changes for high-memory systems

    In case I haven’t blogged about it this year, I wanted to share the usual fix-up that needs to be done to make full use of more than say 512 MB of RAM:

    I had to swap “shells” recently, dropping my laptop’s hard drive into a replacement chassis. I realized later that it had half the usual RAM, and to get back to the 2 GB I was supposed to have took a few weeks.

    On the suspicion that Windows might readjust its memory allocation parameters if it detects less memory than it started with, I figured I’d check on it after getting the RAM upgraded back to 2 GB. Sure enough, things are back to the defaults:

    • the “Windows SharedSection” portion of the Subsystems\Windows Registry setting was configured to 1024,3072,512, and like Matt I boosted it to 1024,8192,2048
    • the “SessionViewSize” Registry setting was configured to 48 MB, and I boosted it to 64 MB (just another multiple of 16, and figured a little more probably goes a long way).

    Now go and do likewise.