Archive for February, 2006

Win32OLE + DRb – Windows = Fun

Wednesday, February 8th, 2006

Posted to the Boston.rb mailing list

Since we didn’t get to any code last night, I wanted to share one of the things that I’ve gotten a kick out of.

It sounds like quite a few of us have already discovered how nice DRb is for helping less-able machines do more interesting things. I’ve used it twice: once because I couldn’t make SSL connections with Net:HTTPS on Solaris 5.6 and once when I wanted to be able to use Word from *nix.

My Word solution consists of a few small parts in different places, a command-line program for human use, a little module to encapsulates what I wanted to do with Word, and a DRb server running on the Windows box.

Note: The following code is not intended to be a secure, elegant solution ready for production deployment–clean this up if you want to use it for real.

Requirements: A Windows box with Word that you can run Ruby on and is addressable. The Windows box and the calling boxen should share some drive that you know how to get to. Mine is just an NFS mounted drive. Some understanding of the Word object model is extremely useful.

This example explains converting Word (.doc) files into WordprocessingML (.xml, though I call them .wml) [the XML file format in Word2003] files.

Command line tool to convert on *nix:

===============================================

#!/usr/bin/env ruby
require 'drb/drb'
PORT = 2774   # Some open port
HOSTNAME = 'foo.bar.com'   # IP of Windows box
DRb.start_service

# Connect to the Windows box
drb = DRbObject.new(nil, "druby://#{HOSTNAME}:#{PORT}")

# Ask it to make sure Word is running
word = drb.start_word

ARGV.each {|f|
  # inelegant way of converting my *nix paths to something the
  # Windows box liked
  unix_filename = File.expand_path(f)
  win_filename = unix_filename.gsub(/\//, "\\")
  win_filename.sub!(/^\\work/, "R:")  

  # Call the transformation, macro, whatever
  resp = drb.wdtowml(win_filename)
  puts "Converted to WML file: #{resp}"
}
drb.quit

===============================================

My server for the Windows box:
===============================================

require 'drb'
require 'thread'
require 'drb/acl'
require 'wordhelper' # the module that does the work

PORT = 2774
HOSTNAME = 'foo.bar.com'

# Security?
acl = ACL.new(%w(deny all
                 allow localhost
                 allow zoo.bar.com
                 allow goo.bar.com)) # Some set of boxen you like
DRb.install_acl(acl)

# Let people talk to me, bind me to the Word module
DRb.start_service("druby://#{HOSTNAME}:#{PORT}", WordHelper::Word.new)

# Keep running
DRb.thread.join

===============================================

The WordHelper module, where the work is done:
===============================================

module WordHelper
  class Word
    require 'win32ole'

    WORD_HTML = 8  # Ugly, don't use
    WORD_XML = 11  # Much nicer, you should use this
    WORD_95 = 106  # Help old programs
    WORD_DOC = 0  # The regular filetype

    attr_reader :wd, :wrd

    def start_word
      @wd = WIN32OLE.new('Word.Application')
      # Win32OLE sometimes barf, so try to start Word
      # in two ways
      begin
        @wrd = WIN32OLE.connect('Word.Application')
      rescue WIN32OLERuntimeError
        @wrd = WIN32OLE.new('Word.Application')
      end

      # Set this to 0 if you want to run invisibly
      # Be warned: you'll end up with a lot of zombie Word
      # processes if you're not careful
      @wd.Visible = 1
      return @wd, @wrd
    end

    # Word to WordprocessingML (xml)
    def wdtowml(file)
      begin
        # Expect a proper Windows-ready filename
        doc = @wd.Documents.Open(file)
        new_filename = file.sub(/doc$/, "wml")
        doc.SaveAs(new_filename, WORD_XML)
        doc.Close()
        return new_filename
      rescue
        # Just fail blindly on errors
        @wd.Quit()
        raise "Word encountered an unknown error and crashed."
      end
    end

    # Almost the same method, just as an example
    def wdtohtml(file)
      begin
        # Expect a proper Windows-ready filename
        doc = @wd.Documents.Open(file)
        new_filename = file.sub(/doc$/, "html")
        doc.SaveAs(new_filename, WORD_HTML)
        doc.Close()
        return new_filename
      rescue
        @wd.Quit()
        raise "Word encountered an unknown error and crashed."
      end
    end

    def quit
      @wd.Quit()
    end
  end
end # of WordHelper Module

===============================================

Another example with the use of macros or the Ruby equivalent:

Now, if you know that your Word instance will always have a set of macros (from a template, say), you can call them thusly:
===============================================

    def wdrunmacro(file, macro)
      begin
        # Expect a proper Windows-ready filename
        doc = @wd.Documents.Open(file)
        @wrd.Run("TheMacroIAlwaysRun", doc)
        @wrd.Run(macro, doc)  # the macro name passed in
        doc.Save()
        doc.Close()
        return new_filename
      rescue
        @wd.Quit()
        raise "Word encountered an unknown error and crashed."
      end
    end

===============================================

The above suffers from relying on macros being available whenever the method is called. With a little work, you should be able to translate your VBA macros into Ruby code, callable from anywhere.

Here’s a stupid example that checks the first character of Body paragraphs following Heading 1 paragraphs for weirdness, deletes that first character, removes the all the character formatting from the Body paragraph and styles the paragraph Heading 2 (I said it was stupid..).

Note: This is written in a very VBAish way, which may or may not be good for you, since it’s a pretty direct mapping.
===============================================

def deletestupid(doc)
  doc.Paragraphs.each do |para|
    if para.Style.NameLocal.match(/Heading\s?1/)
      p = para.Next  # Won't work if this is the last para
      r = p.Range()  # So you can talk about characters
      if p.Style.NameLocal.match(/Body/)
        unless r.Characters.First.Text =~ /[ A-Za-z0-9]/
          # could also be r.Characters(1).Delete()
          r.Characters.First.Delete()  

          # Blast away character formatting
          p.Range.Font.Reset()

          # Apply a new paragraph style
          p.Style = doc.Styles("Heading 2")
        end
      end
    end
  end
end

===============================================