JRuby + Jetty

June 6th, 2007

I finally figured out how to get JRuby to serve a Jetty servlet today (thanks to Charles). The key was flipping what I’d been trying to do for a while (getting Jetty to run JRuby). Here’s code that implements the AbstractHandler interface pretty trivially:

$ cat jetty_example.jrb
require 'java'
include_class 'javax.servlet.ServletException'
include_class 'javax.servlet.http.HttpServlet'
include_class 'javax.servlet.http.HttpServletRequest'
include_class 'javax.servlet.http.HttpServletResponse'

include_class 'org.mortbay.jetty.Server'
include_class 'org.mortbay.jetty.servlet.Context'
include_class 'org.mortbay.jetty.servlet.ServletHolder'
include_class 'org.mortbay.jetty.handler.AbstractHandler'

class SimpleHandler < AbstractHandler
  def handle(target, request, response, dispatch)
    response.setContentType("text/html")
    response.setStatus(HttpServletResponse::SC_OK)
    response.getWriter().println("<h1>Goodbye, cruel monoglot world!</h1>")
    request.setHandled(true)
  end
end

handler = SimpleHandler.new
server = Server.new(8080)
server.setHandler(handler)
server.start()

To run, add Jetty to your classpath:

$ export CLASSPATH="/path/to/jetty-6.1.3.jar:/.../jetty-util-6.1.3.jar:/.../servlet-api-2.5-6.1.3.jar"

Then it’s just a normal JRuby invocation:

$ jruby jetty_example.jrb

It’s trivial code at this point (and doesn’t handle concurrent requests, maxing out at 6.47r/s across my network), but at least it’s got me started.

[UPDATE: I can get the non-concurrent request handling way down with just a few simple tweaks (mainly running JRuby in SERVER mode) and running ab locally ;-)]

The Code Behind DocBook Elements in the Wild

May 1st, 2007

[UPDATE: Added a link to the categorized CSV file below]

Here’s some of the nitty-gritty behind DocBook Elements in the Wild. We’re trying to get a count of all of the element names in a set of 49 DocBook 4.4 <book>s.

First, go ask the O’Reilly product database for all the books that were sent to the printer in 2006. Because I’m better at XML than Unix text tools, ask for mysql -X. Now we’ve got something like:

<resultset statement="select...">
 <row>
        <field name="isbn13">9780596101619</field>
        <field name="title">Google Maps Hacks</field>
        <field name="edition">1</field>
        <field name="book_vendor_date">2006-01-05</field>
  </row>
  <row>
        <field name="isbn13">9780596008796</field>
        <field name="title">Excel Scientific and Engineering Cookbook</field>
        <field name="edition">1</field>
        <field name="book_vendor_date">2006-01-06</field>
  </row>
  <row>
        <field name="isbn13">9780596101732</field>
        <field name="title">Active Directory</field>
        <field name="edition">3</field>
        <field name="book_vendor_date">2006-01-06</field>
  </row>
  ...

Next, fun with XMLStarlet:

$ xml sel -t -m "//field[@name='isbn13']" -v '.' -n books_in_2006.xml
9780596101619
9780596008796
9780596101732
9780596009441
...

Now, pull the content down from our Atom Publishing Protocol repository and make a big document with XIncludes:

#!/usr/bin/env ruby
require 'kurt'
require 'rexml/document'
OUTFILE = "aggregate.xml"
files_downloaded = []
ARGV.each {|atom_id|
  entry = Atom::Entry.get_entry("#{Kurt::PROD_RESOURCES}/#{CGI.escape(atom_id)}")
  filename = atom_id.gsub(/\W/, '') + ".xml"
  File.open(filename, "w") {|f|
    f.print entry.content
  }
  files_downloaded << filename
}

agg = REXML::Document.new
agg.add_element("books")
agg.root.add_namespace("xi", "http://www.w3.org/2001/XInclude")
files_downloaded.each {|file|
  xi = agg.root.add_element("xi:include")
  xi.add_attribute("href", file)
}
File.open(OUTFILE, "w") {|f|
  agg.write(f, 2)
}

Resolve all of the XIncludes into one big file:

$ xmllint --xinclude -o aggregate.xml aggregate.xml 

It’s now pretty huge (well, huge in my world):

$ du -h aggregate.xml
102M    aggregate.xml

At this point, we’re ready to do the real counting of the elements (slow REXML solution commented out in favor of a libxml-based solution):

#!/usr/bin/env ruby
require 'rexml/parsers/pullparser'
require 'rubygems'
require 'xml/libxml'
start = Time.now
ARGV.each {|filename|
  counts = Hash.new
#  parser = REXML::Parsers::PullParser.new(File.new(filename))
#  while parser.has_next?
#    el = parser.pull
#    if el.start_element?
#      element_name = el[0]
#      if counts[element_name]
#        counts[element_name] += 1
#      else
#        counts[element_name] = 1
#      end
#    end
#  end
  parser = XML::SaxParser.new
  parser.filename = filename
  parser.on_start_element {|element_name, _|
    if counts[element_name]
      counts[element_name] += 1
    else
      counts[element_name] = 1
    end
  }
  parser.parse

  File.open(filename + ".count.csv", "w") {|f|
    counts.each {|element_name, count|
      f.puts "\"#{element_name}\",#{count}"
    }
  }
}

(Hooray for steam parsing, as this 100MB file was cranked through in 27 seconds on a 700MHz box!)

Finally, we’ve got CSV and we can do some graphing. Here’s the full CSV and the categorized CSV. Rather than working on a code-based graphing solution, I just messed with Excel. The result:

DocBook Elements from 49 Books

Here’s my favorite, a drill-down based on a categorization I just made up (click through for the drill-down):

DocBook Elements from 49 Books, Categorized

Books used:

APP Interop Pictures

April 17th, 2007

I took a couple of quick shots of the group:

The assembled crowd at the APP interop

and the grid:

The interop grid at the APP interop

More details on my xml.com blog post.

Tim Bray has better photos here.

JRuby + JFreeChart = Sparklines

April 13th, 2007

Inspired by how easy it was to get JFreeChart working and some code from former colleague Andrew Bruno, I thought it’d be nice to write some JRuby to generate Edward Tufte’s Sparklines.

Here’s some simple example code on a semi-random dataset:

# Mostly inspired by
# http://left.subtree.org/2007/01/15/creating-sparklines-with-jfreechart/
# have JFreeChart in your classpath, obviously, as well as jcommon.jar
require 'java'

module Graph
  class Sparkline
    include_class 'java.io.File'
    include_class 'org.jfree.chart.ChartUtilities'
    include_class 'org.jfree.chart.JFreeChart'
    include_class 'org.jfree.chart.axis.NumberAxis'
    include_class 'org.jfree.chart.plot.XYPlot'
    include_class 'org.jfree.chart.renderer.xy.StandardXYItemRenderer'
    include_class 'org.jfree.data.xy.XYSeries'
    include_class 'org.jfree.data.xy.XYSeriesCollection'
    include_class 'org.jfree.chart.plot.PlotOrientation'

    def initialize(width=200, height=80, data=[])
      @width = width
      @height = height
      dataset = create_sample_data() if data.empty?
      @chart = create_chart(dataset)
    end

    def render_to_file(filename, format="png")
      javafile = java.io.File.new(filename)
      ChartUtilities.saveChartAsPNG(javafile, @chart, @width, @height)
    end

    private
    def create_sample_data
       series = XYSeries.new("Sparkline")
      data = [20]
      (1..99).each {|x|
        y = (data.last + (rand(x) + 1)) / 2
        data << y
        series.add(x, y)
      }

      dataset = XYSeriesCollection.new
      dataset.addSeries(series)
      return dataset
    end

    def create_chart(dataset)
      x = NumberAxis.new
      x.setTickLabelsVisible(false)
      x.setTickMarksVisible(false)
      x.setAxisLineVisible(false)
      x.setNegativeArrowVisible(false)
      x.setPositiveArrowVisible(false)
      x.setVisible(false)

      y = NumberAxis.new
      y.setTickLabelsVisible(false)
      y.setTickMarksVisible(false)
      y.setAxisLineVisible(false)
      y.setNegativeArrowVisible(false)
      y.setPositiveArrowVisible(false)
      y.setVisible(false)

      plot = XYPlot.new
      plot.setDataset(dataset)
      plot.setDomainAxis(x)
      plot.setDomainGridlinesVisible(false)
      plot.setDomainCrosshairVisible(false)
      plot.setRangeGridlinesVisible(false)
      plot.setRangeCrosshairVisible(false)
      plot.setRangeAxis(y)
      plot.setOutlinePaint(nil)
      plot.setRenderer(StandardXYItemRenderer.new(StandardXYItemRenderer::LINES))

      chart = JFreeChart.new(nil, JFreeChart::DEFAULT_TITLE_FONT, plot, false)
      chart.setBorderVisible(false);
      return chart
    end

  end # class Sparkline  
end # class Graph

sp = Graph::Sparkline.new
puts "Rendering sparkline"
sp.render_to_file("sparkline.png")

And the resulting sparkline chart:
An Example Sparkline Chart

Code: http://kfahlgren.com/code/sparkline.jrb

UPDATE: Removed some of the useless sample generation code

More JRuby Play: JFreeChart

April 12th, 2007

I’ve been messing around at work trying to make some automated scheduling charts (basically Gantt-like) in Ruby. I’ve implemented it a couple of times using SVG::Graph, which is close to what I need, but I end up having to rewrite a lot of methods whenever I really start using it. It occurred to me today that I might be able to co-opt a sexy Java library to do my dirty work. JFreeChart to the rescue!

As before, I’m generally amazed at how little work goes into integrating Java and JRuby these days. It’s a testament to the JRuby team and to the wealth of well-written, well-documented Java libraries out there.

Here’s some toy code that makes a simple Gantt chart and saves it as a PNG to a file:

# have jfreechart.jar in your classpath, obviously, as well as jcommon.jar
# and use a recent jruby
require 'java'
module Gantt
  class Simple
    include_class 'org.jfree.chart.ChartFactory'
    include_class 'org.jfree.chart.ChartUtilities'
    include_class 'org.jfree.chart.JFreeChart'
    include_class 'org.jfree.data.gantt.Task'
    include_class 'org.jfree.data.gantt.TaskSeries'
    include_class 'org.jfree.data.gantt.TaskSeriesCollection'
    include_class 'org.jfree.data.time.SimpleTimePeriod'
    include_class 'java.lang.System'
    include_class 'java.io.File'

    MILLIS_IN_A_DAY = 86400000

    def initialize(title="Chunky Bacon", width=700, height=400, data=[])
      @width = width
      @height = height
      @title = title
      dataset = create_sample_data() if data.empty?
      @chart = create_chart(dataset)
    end

    def render_to_file(filename, format="png")
      javafile = java.io.File.new(filename)
      ChartUtilities.saveChartAsPNG(javafile, @chart, @width, @height)
    end

    private
    def create_sample_data
      # dates as milliseconds seems the easiet
      now = System.currentTimeMillis
      tomorrow = now + (MILLIS_IN_A_DAY * 1)
      day_after_tomorrow = now + (MILLIS_IN_A_DAY * 2)
      week_from_today = now + (MILLIS_IN_A_DAY * 7)

      s1 = TaskSeries.new("JRuby")
      s1.add(Task.new("Download JRuby",
                      SimpleTimePeriod.new(now, tomorrow)))
      s1.add(Task.new("Write Code",
                      SimpleTimePeriod.new(tomorrow, day_after_tomorrow)))
      s1.add(Task.new("Setup CLASSPATH",
                      SimpleTimePeriod.new(day_after_tomorrow, week_from_today)))

      s2 = TaskSeries.new("Java")
      s2.add(Task.new("Read Comics",
                      SimpleTimePeriod.new(now, tomorrow)))
      s2.add(Task.new("Write Code",
                      SimpleTimePeriod.new(tomorrow, day_after_tomorrow)))
      s2.add(Task.new("Setup CLASSPATH",
                      SimpleTimePeriod.new(day_after_tomorrow, week_from_today)))

      collection = TaskSeriesCollection.new
      collection.add(s1)
      collection.add(s2)
      return collection
    end

    def create_chart(dataset)
      opts = {
              :title => @title,
              :domain_axis_label => "Task",
              :range_axis_label => "Date",
              :data => dataset,
              :include_legend => true,
              :tooltips => false,
              :urls => false
             }
      chart = ChartFactory.createGanttChart(
                                              opts[:title],
                                              opts[:domain_axis_label],
                                              opts[:range_axis_label],
                                              opts[:data],
                                              opts[:include_legend],
                                              opts[:tooltips],
                                              opts[:urls]
                                            )
      return chart
    end

  end # class Simple
end # module Gantt

chart = Gantt::Simple.new("Gantt Chart Demo")
puts "Rendering chart"
chart.render_to_file("simplegantt.png")

Example PNG:

A Simple Gantt Chart Example

Code: http://kfahlgren.com/code/simple_gantt.jrb

Hiding Complexity

April 7th, 2007

I just started reading the second edition of The Ruby Way by Hal Fulton and came across this gem:

We can’t avoid complexity, but we can push it around. We can bury it out of sight. This is the old “black box” principle at work; a black box performs a complex task, but it possesses simplicity on the outside.

This idea of managing complexity is one of the classic commandments of programming, of course, and a core theme of Structure and Interpretation of Computer Programs, but this was a nice restatement.

It looks like this edition (in all it’s 800+ page glory) will be quite a treat.

Jane Street Capital Is On To Me

March 26th, 2007

If you search for the right strings, you’ve probably already seen the Jane Street Capital ads for OCaml programmers in Gmail or elsewhere, but today I got a new one that really cracked me up:

Jane Street Capital ad: Do you think In Closures? We do too!

I’ll even link to the ad because it was so funny… and I do think in closures, and you certainly got my attention!

“Literate” Programming, Technical Writing

March 16th, 2007

There’s been some recent discussion on ruby-talk about “literate” programming after the new O’Reilly title Beautiful Code was announced (Matz has written an essay for it). Matz’s response made me listen to all-things-Knuth, so I was pleased to read Philip Wadler’s post today on Three ways to improve your writing, which includes a PDF link to Knuth (and others) lecturing on writing and “literate” programming. I’ve just started reading it, but have already found two laugh-worthy gems:

13. Many readers will skim over formulas on their first reading of your exposition. Therefore, your sentences should flow smoothly when all but the simplest formulas are replaced by “blah” or some other grunting noise.

20. Some handy maxims:
Watch out for prepositions that sentences end with.
When dangling, consider your participles.
About them sentence fragments.
Make each pronoun agree with their antecedent.
Don’t use commas, which aren’t necessary.
Try to never split infinitives.

I know the first is certainly true for me as I’ve been trying to wade through Haskell (a language into “literate programming”) introductions recently, which are very math-heavy.

Borrowing Java’s XSLT Support for Ruby

March 2nd, 2007

Well, I finally caught up with the crowd and got JRuby running on one of my dev boxes. The reason I’d been interested in it from the getgo was because Ruby lacks any support for internal XSLT processing. All those system()s were starting to get me down, especially as I’m trying to get a DocBook->PDF rendering webservice to be a lot faster. Much to my surprise, I was able to get simple transforms working in almost no time (thanks in part to lots of help). Without further ado, here’s a simple library for XSLT transforms using either Xalan-J or Saxon (make sure you have the jars for both in your CLASSPATH):

require 'java'
module JXslt
  include_class "javax.xml.transform.TransformerFactory"
  include_class "javax.xml.transform.Transformer"
  include_class "javax.xml.transform.stream.StreamSource"
  include_class "javax.xml.transform.stream.StreamResult"
  include_class "java.lang.System"

  class XsltProcessor
    def transform(xslt,infile,outfile)
      transformer = @tf.newTransformer(StreamSource.new(xslt))
      transformer.transform(StreamSource.new(infile), StreamResult.new(outfile))
    end
  end # XsltProcessor
  class Saxon < XsltProcessor
    TRANSFORMER_FACTORY_IMPL = "net.sf.saxon.TransformerFactoryImpl"
    def initialize
      System.setProperty("javax.xml.transform.TransformerFactory", TRANSFORMER_FACTORY_IMPL)
      @tf = TransformerFactory.newInstance
    end
  end
  class Xalan < XsltProcessor
    TRANSFORMER_FACTORY_IMPL = "org.apache.xalan.processor.TransformerFactoryImpl"
    def initialize
      System.setProperty("javax.xml.transform.TransformerFactory", TRANSFORMER_FACTORY_IMPL)
      @tf = TransformerFactory.newInstance
    end
  end
end 

# if you wanted to run this from the command line, do something like
# $ jruby lib/jxslt.rb a.xsl in.xml out.xml
xalan = JXslt::Xalan.new
xalan.transform(*ARGV)
#saxon = JXslt::Saxon.new
#saxon.transform(*ARGV)

Big props to Charles for helping me get going and writing the first version of the above.

darcs get http://kfahlgren.com/code/lib/jxslt/ or jxslt.rb

Exploiting FrameMaker MIF as XML, Reading Bookfiles

February 25th, 2007

[Read this for an introduction to what I'm talking about].

Now that we’ve got our FrameMaker documents in XML, how can we exploit their new format? One of the first things I did was to create new ways of reading (eventually changing) the simple data stored within them. This isn’t all that earth-shattering, but when you consider how difficult it is to find and change some values in the FrameMaker UI this is a big win. Where to start? Bookfiles.

To be able to apply stylesheets or data-collection tools to books (rather than individual files), I need to be able to collect a books components. So, convert your bookfile to MX (yeah, it works on bookfiles as well as chapter files), and search through it for one of the filenames you know is a part of the book (you’ll probably want to pretty-print the XML first). I get something like this:

  <BookComponent>
    <FileName>`&lt;c\&gt;ch01'</FileName>
    <Unique>27107</Unique>
    <StartPageSide>StartRightSide</StartPageSide>
    <PageNumbering>Restart</PageNumbering>
    <PgfNumbering>Continue</PgfNumbering>
    <PageNumPrefix>`'</PageNumPrefix>
    <PageNumSuffix>`'</PageNumSuffix>
    <DefaultPrint>Yes</DefaultPrint>
    <DefaultApply>Yes</DefaultApply>
  </BookComponent>

MX in this case has a pretty comprehensible structure, so we’ll need to grab a BookComponent/FileName, do a little text processing to remove the funky characters, and potentially append our MX file extension (I chose “.mx”). Here’s a very simple stylesheet to do just that:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="@*|node()">
    <xsl:apply-templates/>
  </xsl:template>  

  <xsl:template match="/">
    <xsl:element name="components">
      <xsl:apply-templates/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="//BookComponent/FileName">
    <xsl:param name="extension" select="'.mx'"/>
    <xsl:variable name="str-after">
      <xsl:value-of select="substring-after(., '>')"/>
    </xsl:variable>
    <xsl:element name="component">
      <xsl:value-of select="substring($str-after,
                                      1,
                                      string-length($str-after) - 1)"/>
      <xsl:value-of select="$extension"/>
    </xsl:element>
  </xsl:template>

</xsl:stylesheet>

When you run that on a MX bookfile, you should see an output like this (note that the file extension is customizable above):

<?xml version="1.0"?>
<components>
  <component>svcTOC.fm.mx</component>
  <component>foreword.mx</component>
  <component>ch00.mx</component>
  <component>ch01.mx</component>
  <component>ch02.mx</component>
  <component>ch03.mx</component>
  <component>ch04.mx</component>
  <component>ch05.mx</component>
  <component>ch06.mx</component>
  <component>ch07.mx</component>
  <component>ch08.mx</component>
  <component>ch09.mx</component>
  <component>appa.mx</component>
  <component>appb.mx</component>
  <component>appc.mx</component>
  <component>appd.mx</component>
  <component>appe.mx</component>
  <component>svcIX.fm.mx</component>
  <component>svcAPL.fm.mx</component>
  <component>svcLOR.fm.mx</component>
</components>

That’ll give us a nice structure to direct other processes to the individual component files.

The code is also available here or darcs get http://kfahlgren.com/code/mx/.