Porting Word2MediaWikiPlus to VB.NET: Part 3

[This series has three previous articles: the prologue, Part 1 and Part 2.]

Digging into modWord2MediaWikiPlus

This is the motherlode, right?  Here’s where all the action happens — all the reformatting, text extraction and Wiki-izing.  Yes, this monster VBA module probably has the majority of the code I’ll be porting over to VSTO.

So here’s a canonical list of the Functions and Sub’s contained therein — the best way for me to get to know this beast is to strip it down, piece by piece:

  • MW_CloseProgramm()
  • Word2MediaWikiPlus_Config()
  • Sub Word2MediaWikiPlus_Upload()
  • MediaWikiConvert_CleanUp()
  • MediaWikiConvert_Comments()
  • MediaWikiConvert_EscapeChars()
  • MediaWikiConvert_Fields()
  • MediaWikiConvert_FontColors()
  • MediaWikiConvert_FootNotes()
  • MediaWikiConvert_FormFields()
  • MediaWikiConvert_Headings()
  • MediaWikiConvert_HTMLChars()
  • MediaWikiConvert_Hyperlinks()
  • MW_ImageInfoReset()*
  • MediaWikiExtract_Images()*
  • MediaWikiExtract_ImagesHtml()*
  • MediaWikiExtract_ImagesPhotoEditor()*
  • MediaWikiConvert_Indention()
  • MediaWikiConvert_IndentionTab()
  • MediaWikiConvert_Lists()
  • MediaWikiConvert_Paragraphs()
  • MediaWikiConvert_Prepare()
  • MediaWikiConvert_Tables()
  • MediaWikiConvert_TabTables()
  • MediaWikiConvert_TextFormat()
  • MediaWikiImageUpload()*
  • MediaWikiOpen()
  • MediaWikiReplaceQuotes()
  • MW_CheckFileName()
  • MW_CheckFileNameTitle()
  • MW_ClearFormatting()
  • MW_Convert_Table()
  • MW_ReplaceSpecialCharactersFirst()
  • MW_ReplaceCharacter()
  • MW_FindNormalWidth()
  • MW_FontFormat()
  • MW_FormatCategoryString()
  • MW_GetEditorPath()
  • MW_GetImageNameFromFile()*
  • MW_GetImagePath()*
  • MW_GetScaleIS()*
  • MW_GetUserLanguage()
  • MW_ImageExportPowerpointPNG()*
  • MW_ImageExtract()*
  • MW_ImageExtract2()*
  • MW_ImagePathName()*
  • MW_ImageUpload_File()*
  • MW_Initialize()
  • MW_InsertPageHeaders()
  • MW_LanguageTexts()
  • MW_PhotoEditor_Convert()*
  • MW_PowerpointQuit()
  • MW_ReplaceString()
  • MW_ScaleMax()*
  • MW_ScaleMaxOK()*
  • MW_SearchAddress()
  • MW_SetWikiAddressRoot()
  • MW_ChangeView()
  • MW_Statusbar()
  • MW_SurroundHeader()
  • MW_TableInfo()
  • MW_WordVersion()
  • GetRegValidate()
  • RemoveDir()
  • MediaWikiExtract_ImagesHtml2002()*
  • MakeDir()
  • TestSendMessage()
  • TestImageInfo()*
  • TestUnicode()
  • TestReadUnicode()
  • TestCopyDoc()
  • MW_SetOptions_2003()

My initial reaction

There’s a few things that stand out for me so far from this code module:

  1. Some functions are prefixed “MW_“, others “MediaWiki“, but only those with the “MediaWiki” prefix have the “copyright by Gunter Schmidt” notice.  I imagine the “MW_” functions are inherited from another codebase.  Just something to watch out for.
  2. There are references to Word 97 and Word 2000 in this code, but it occurs to me that VSTO probably doesn’t support anything less than Word XP or Word 2003.  I should check on that, and then I’ll know what code has to be cut out.
  3. There’s a ton of code that’s focused on migrating images from the source document to the target wiki page…including a bunch of SendKeys operations that I at first suspected couldn’t be easily implemented in .NET.  However, it looks like .NET implements this as the System.Windows.Forms.SendKeys class.
  4. I’m puzzled by the appearance of PowerPoint in the code — I wonder what it’s really being used for?
  5. There’s also a bunch of talk of using Microsoft Photo Editor (which seems to have died off at Office XP) — I wonder if .NET 2.0 has any image-manipulation classes that will relieve us of this dependency?  Photo Editor’s replacement (Picture Manager) doesn’t provide much in the way of editing functionality, and Microsoft’s other suggestion (Digital Image Pro) has recently been “de-hired” as well.
  6. There is a framework in this code for supporting a wide variety of languages, but in the worst case, it appears that only English and German are actually enabled.
  7. There are frequent instances of debug code in there, which I presume is just used as the equivalent to the built-in Breakpoint & exception handling in Visual Studio 2005.  I’ll likely drop it entirely, except where DEBUG or TRACE functionality would be useful.
  8. I’m also noticing a trend towards using the Registry to store a bunch of temporary settings that are only relevant to this Macro, not to Word or Windows in general.  I’ll likely convert this over to the XML Settings approach of .NET — there’s something that just seems “wrong” about using the Registry for such ephemeral data (except of course when there are no other options).
  9. I’m fascinated by the naming of all the objects in this code, and looking forward to making things a lot easier to understand.  All these cryptic variables, the prefixes that won’t be needed once the OO hierarchy is actually available to get the context for functions.  I’ll do my best to implement all the Microsoft guidelines for writing good .NET code – the more of others’ code I read, the more I appreciate those few who follow these guidelines.

Plan of attack

  1. I think I’m going to have to carve this down into releases, and not try to implement this all at once.
  2. I’m thinking that the image-manipulation code, while very sexy and useful, isn’t critical for the core use cases for a Word-based DOC-to-MediaWiki convertor.  [All those functions are labelled with a *.]  These are probably for “release 3”.
  3. The table-manipulation code seems more important, but still it’s probably not essential.  That seems like a “release 2” kind of thing.

Right now I’m feeling like I have a pretty good handle on what’s in store for me, and the “itch” to stop planning and start coding is getting to be too much to resist.  I think I’ll make a list of the top ten things I’ll want to start on, and then just get down to it.

Join us again — same Bat time, same Bat channel!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s