Porting Word2MediaWikiPlus to VB.NET: Part 3

[This series has three previous articles: the prologue, Part 1 and Part 2.]

Digging into modWord2MediaWikiPlus

This is the motherlode, right?  Here’s where all the action happens — all the reformatting, text extraction and Wiki-izing.  Yes, this monster VBA module probably has the majority of the code I’ll be porting over to VSTO.

So here’s a canonical list of the Functions and Sub’s contained therein — the best way for me to get to know this beast is to strip it down, piece by piece:

  • MW_CloseProgramm()
  • Word2MediaWikiPlus_Config()
  • Sub Word2MediaWikiPlus_Upload()
  • MediaWikiConvert_CleanUp()
  • MediaWikiConvert_Comments()
  • MediaWikiConvert_EscapeChars()
  • MediaWikiConvert_Fields()
  • MediaWikiConvert_FontColors()
  • MediaWikiConvert_FootNotes()
  • MediaWikiConvert_FormFields()
  • MediaWikiConvert_Headings()
  • MediaWikiConvert_HTMLChars()
  • MediaWikiConvert_Hyperlinks()
  • MW_ImageInfoReset()*
  • MediaWikiExtract_Images()*
  • MediaWikiExtract_ImagesHtml()*
  • MediaWikiExtract_ImagesPhotoEditor()*
  • MediaWikiConvert_Indention()
  • MediaWikiConvert_IndentionTab()
  • MediaWikiConvert_Lists()
  • MediaWikiConvert_Paragraphs()
  • MediaWikiConvert_Prepare()
  • MediaWikiConvert_Tables()
  • MediaWikiConvert_TabTables()
  • MediaWikiConvert_TextFormat()
  • MediaWikiImageUpload()*
  • MediaWikiOpen()
  • MediaWikiReplaceQuotes()
  • MW_CheckFileName()
  • MW_CheckFileNameTitle()
  • MW_ClearFormatting()
  • MW_Convert_Table()
  • MW_ReplaceSpecialCharactersFirst()
  • MW_ReplaceCharacter()
  • MW_FindNormalWidth()
  • MW_FontFormat()
  • MW_FormatCategoryString()
  • MW_GetEditorPath()
  • MW_GetImageNameFromFile()*
  • MW_GetImagePath()*
  • MW_GetScaleIS()*
  • MW_GetUserLanguage()
  • MW_ImageExportPowerpointPNG()*
  • MW_ImageExtract()*
  • MW_ImageExtract2()*
  • MW_ImagePathName()*
  • MW_ImageUpload_File()*
  • MW_Initialize()
  • MW_InsertPageHeaders()
  • MW_LanguageTexts()
  • MW_PhotoEditor_Convert()*
  • MW_PowerpointQuit()
  • MW_ReplaceString()
  • MW_ScaleMax()*
  • MW_ScaleMaxOK()*
  • MW_SearchAddress()
  • MW_SetWikiAddressRoot()
  • MW_ChangeView()
  • MW_Statusbar()
  • MW_SurroundHeader()
  • MW_TableInfo()
  • MW_WordVersion()
  • GetRegValidate()
  • RemoveDir()
  • MediaWikiExtract_ImagesHtml2002()*
  • MakeDir()
  • TestSendMessage()
  • TestImageInfo()*
  • TestUnicode()
  • TestReadUnicode()
  • TestCopyDoc()
  • MW_SetOptions_2003()

My initial reaction

There’s a few things that stand out for me so far from this code module:

  1. Some functions are prefixed “MW_“, others “MediaWiki“, but only those with the “MediaWiki” prefix have the “copyright by Gunter Schmidt” notice.  I imagine the “MW_” functions are inherited from another codebase.  Just something to watch out for.
  2. There are references to Word 97 and Word 2000 in this code, but it occurs to me that VSTO probably doesn’t support anything less than Word XP or Word 2003.  I should check on that, and then I’ll know what code has to be cut out.
  3. There’s a ton of code that’s focused on migrating images from the source document to the target wiki page…including a bunch of SendKeys operations that I at first suspected couldn’t be easily implemented in .NET.  However, it looks like .NET implements this as the System.Windows.Forms.SendKeys class.
  4. I’m puzzled by the appearance of PowerPoint in the code — I wonder what it’s really being used for?
  5. There’s also a bunch of talk of using Microsoft Photo Editor (which seems to have died off at Office XP) — I wonder if .NET 2.0 has any image-manipulation classes that will relieve us of this dependency?  Photo Editor’s replacement (Picture Manager) doesn’t provide much in the way of editing functionality, and Microsoft’s other suggestion (Digital Image Pro) has recently been “de-hired” as well.
  6. There is a framework in this code for supporting a wide variety of languages, but in the worst case, it appears that only English and German are actually enabled.
  7. There are frequent instances of debug code in there, which I presume is just used as the equivalent to the built-in Breakpoint & exception handling in Visual Studio 2005.  I’ll likely drop it entirely, except where DEBUG or TRACE functionality would be useful.
  8. I’m also noticing a trend towards using the Registry to store a bunch of temporary settings that are only relevant to this Macro, not to Word or Windows in general.  I’ll likely convert this over to the XML Settings approach of .NET — there’s something that just seems “wrong” about using the Registry for such ephemeral data (except of course when there are no other options).
  9. I’m fascinated by the naming of all the objects in this code, and looking forward to making things a lot easier to understand.  All these cryptic variables, the prefixes that won’t be needed once the OO hierarchy is actually available to get the context for functions.  I’ll do my best to implement all the Microsoft guidelines for writing good .NET code – the more of others’ code I read, the more I appreciate those few who follow these guidelines.

Plan of attack

  1. I think I’m going to have to carve this down into releases, and not try to implement this all at once.
  2. I’m thinking that the image-manipulation code, while very sexy and useful, isn’t critical for the core use cases for a Word-based DOC-to-MediaWiki convertor.  [All those functions are labelled with a *.]  These are probably for “release 3”.
  3. The table-manipulation code seems more important, but still it’s probably not essential.  That seems like a “release 2” kind of thing.

Right now I’m feeling like I have a pretty good handle on what’s in store for me, and the “itch” to stop planning and start coding is getting to be too much to resist.  I think I’ll make a list of the top ten things I’ll want to start on, and then just get down to it.

Join us again — same Bat time, same Bat channel!

Porting Word2MediaWikiPlus to VB.NET: Part 2

[This series has two previous articles: the prologue and Part 1.]

Initial source code examination

So I figured I’d get right into it.  I opened up Word 2003, went into the Tools > Macro > Macros… menu, and thinking there’d be some obvious Import function, hit the Organizer button.  No such luck, so I tried again via Tools > Macro > Visual Basic Editor, muddled around for a few minutes, and eventually just imported all the files via the File > Import File… menu option.

I don’t know much about how VBA multi-file applications are organized, but I figured that the one listed under the Class Modules folder would likely be the right one to start — Classes are the basic building block for all OO programs, right?

I had a quick look through the Sub’s defined in Class Modules > ThisDocument1 (which must correspond to the ThisDocument.cls file), and thought, “heh, this’ll be easy — there’s only a couple hundred lines of code and a handful of Sub’s”:

  • cmdCopyModuls_Click()
  • CreateSymbol()
  • CreateSymbol2()
  • CopyModulesToNormal()
  • cmdSymbols_Click()
  • cmdUninstall_Click()

Ah, but wait: there’s _Click() routines in them thar hills…which means there’s Form UI to deal with, and now that I look more closely, those .BAS files ended up as a list of entries under the Modules folder.


Yipes!  There’s a few more complicated questions to deal with than I’d originally thought.  For example, modW2MWP_FileDialog contains at least some code with a copyright heading, which could make life a bit tougher:

'***************** Code Start **************
'This code was originally written by Ken Getz.
'It is not to be altered or distributed, except as part of an application.
'You are free to use it in any application, provided the copyright notice is left unchanged.
' Code courtesy of:
' Microsoft Access 95 How-To
' Ken Getz and Paul Litwin
' Waite Group Press, 1996

This may not be so bad though, as the Functions in this module referring to file operations (e.g. GetOpenFile(), ahtCommonFileOpenSave())may be superfluous with the System.IO namespace available in VSTO.  We might not need these file I/O functions, though TestIt(), ahtAddFilterItem(), TriumNull() and TrimTrailingNull() may be needed.This brings up a really good point that I’ve seen mentioned in a couple of places, including John R. Durant’s blog from two years ago:

The real issue I see in migrating from VBA to VSTO is not the language or switching to a new runtime. […]  The more important factor is how the migration will affect the architecture of the application.  It is important to ask questions like: Does the new runtime make it possible for me to code this differently at a more fundamental level?  Can I cut lots of code? […]

This’ll be a fine line for me to walk: I’d like to make the conversion as quickly as possible, but it may not always be easy to figure out what the original code was meant to do.  It doesn’t help matters that many of the Comments in the source code are in German — as much as I’d like to think I can still understand German, it’s been sixteen years since I was in Germany and I’m more than a little rusty.  Hopefully the online German-to-English dictionaries will be able to sort out the gist of it.


The module modW2MWP_Registry is almost entirely focused on Registry interactions (which are well represented by the Microsoft.Win32.Registry class) except for the Uninstall_Word2MediaWikiPlus() Sub.  However, any necessary code has a more appropriate home in the VSTO add-in’s ThisAddIn_Shutdown() Sub.


This “module” is a monster — it’s a wonder it hasn’t been broken down into a whole namespace of classes — or I suppose in the VBA world, the closest they have is “a set of Modules” [no hierarchy.]  And it’s the central code:

‘ Function: Converts a word document to the wiki syntax

Anyway, it has a great deal of functionality — I’ll dig into it later.  This’ll be the majority of the work here.


This module is much smaller than modWord2MediaWikiPlus, but has some interesting functionality as well.  After studying them for a bit, I think I’ve figured out what general category of purpose each of them has:

Filesystem functions

Process functions

Text functions


Oh, and a DisplayError() for good measure.


I think I’m ready to start digging through the code in modWord2MediaWikiPlus – wish me luck!

Join us again — same Bat time, same Bat channel!

Porting Word2MediaWikiPlus to VB.NET: Part 1

[This series of articles starts off with a prologue here.]

Initial Setup

First up, I’d downloaded the source code from here.  When I downloaded it the browser would only let me choose between htm, mht and txt format (I chose txt).  Then, figuring that my work would be easiest if opened the file in Visual Studio 2005 Team edition (with the VSTO SE addition), I needed to figure out what file extension I should assign to ensure Visual Studio recognized it for the kind of code that it is (and get as much of the Visual Studio Intellisense auto-completion and auto-colouring as possible).

I tried .BAS and .VBA, but neither of those extensions had any associated icons, and when I opened the .VBA, every line was displayed entirely in black.

Then, in exploring the filesystem to re-open the file, I noticed I’d created a C#–based project [since my VS2005 install is oriented to C# — having forced myself to write an app in it to prove to myself I wasn’t forever tainted by my earlier VB.NET experiences] instead of VB.NET, so I deleted that one and tried again (File > New > Project > Other languages > Visual Basic > Office > 2003 Add-ins > Word Add-in).  I chose Word 2003 because (a) that’s what I’m currently using, (b) that’s the platform I have a little experience with, and (c) it’s what most folks around the world would be using — at least, much moreso than Word 2007.  I chose to name the solution Word2MediaWiki++, to honour the fine work upon which I’m building.

[I’ve thought about whether this will become just one Project in a larger Solution, and I assume at some point there’ll be a need to rename either the Solution or this first Project to distinguish between this single piece of functionality and the overall Word-to-MediaWiki functionality.  It’s quite likely in fact, but I don’t want to overthink this or over-engineering the proceedings too badly – I have to remind myself that just getting the same functionality in VSTO add-in form is my primary goal.]

Quick Aside

I’ve done some reading about VBA to VSTO conversions in the past (while I was still at Microsoft), and I’d actually developed another Word 2003 add-in — from scratch — as a way to provide a platform for re-implementing a very comprehensive VBA macro-based application we were using.  So I already had some idea that this was possible, and that there are many resources (from Microsoft* and elsewhere) to help in making both the high-level transition and in converting many of the low-level VBA constructs.

A simple search on Google for “vba to vsto” comes up with 828,000 results, and many of them on the first page or two are good, first-person lessons.  However, the articles/white papers/blog posts I’d previously read that had given me the confidence I’m building on now include these:


Oh, and before I actually go do any work on this code, I should double-check that the original author hasn’t put any restrictions on it (copyright, restrictive license, “all rights reserved”)…

… So according to the SourceForge home for W2MWP, this is licensed under the GPL.  It looks like I’m in the clear, so long as I publish this back out under GPL too – wouldn’t want to run afoul of the GPL police now would we?


Join us again – same Bat time, same Bat channel!

[*Footnote: My only gripe about Microsoft’s VSTO SE add-in work is that they’re so freakin’ focused on Office 2007 development that I have to do some serious detective work – either trolling through old VBA or pre-add-in documentation and hoping the classes are similar, or winding through the “everything is wonderful” Office 2007 documentation and hoping the stuff I need hasn’t been reinvented for the wondrous new world of Office 12.]

From VBA to VSTO: porting Word2MediaWikiPlus to VB.NET

I’ve gotten religion about Wikis a while back, and recently I’ve had an incentive to look into the world of conversion applications.  I’m looking into the applications that are available to convert from one format (e.g. Office documents) to MediaWiki format (i.e. the engine behind the venerable Wikipedia).

I went wandering through the Wikipedia Tools pages and when I found the Office-oriented tools, I was surprised that many of them were implemented as VBA macros.  One in particular caught my eye: the one labeled Word2MediaWikiPlus.  It’s a single VBA macro, with a small number of functions, and it looks just ripe for creating a VSTO add-in.  Further, as compared to the other Word macros, it seems like it’s a superset of the others, and that the others preceded this and/or have since been abandoned.

I’m really getting into the community mindset, and I figure that not only will some of my colleagues appreciate having this kind of functionality, but that there’d be many folks on the ‘net who’d probably use something like this as well.  And once a basic CommandBar framework was put in place, and the source code available for re-use, then anyone who’d like to add their own functionality to a VSTO add-in for Office should be able to leverage the basic framework I could establish.

Sounds simple eh?  Just a little parsing through the object model, some digging through VBA-to-VB.NET conversions, and a little refresher on Office.CommandBar and its brethren – and voila!  One VSTO add-in that implements the same functionality as you see in the original VBA macro.  [Famous last words.]

No, I’m not quite that naive – I suspect it’ll take a good deal more than that before I’m through.  With that in mind, I’m considering a novel approach to this project: as I’m going through each stage of the conversion, I’ll blog about my experiences – the dead-end code paths I pursue, the inconsistencies in the Word object model, the under-documented features of the original source code, and all the places I find anything useful that keeps this moving towards completion.

Wish me luck, and I’ll keep you posted on my progress!