Archive for the ‘XSLT’ Category

O’Reilly Release ePubs

Tuesday, July 15th, 2008

As of today, 30 O’Reilly titles are available as Ebook bundles and many will be in the Kindle Store later today:

As promised last month, O’Reilly has released 30 titles as DRM-free downloadable ebook bundles. The bundles include three ebook formats (EPUB, PDF, and Kindle-compatible Mobipocket) for a single price — at or below the book’s cover price.

I’ve spent a reasonable chunk of my year helping make this happen, both on the O’Reilly side and by adding .epub support to the DocBook-XSL stylesheets with Paul Norton of Adobe. Hopefully, our customers will be happy with the new formats.

DocBook-XSL Sytlesheets have >600 Parameters

Wednesday, June 13th, 2007

Norm Walsh writes:

Stylesheets can have literally hundreds of parameters. The DocBook XSL Stylesheets have more than six hundred.

All I can say at this point is: wow. Grepping the core of our own customization shows 121 <xsl:param>s (about 20 of which we introduced) and 52 <xsl:attribute-set>s (20, again). Thinking about it now (as I haven’t before), we’ve probably minimized that number by completely overriding 13 of the “regular” fo/ stylesheets directly (rather than using params or smaller, single-template overrides). The DocBook-XSL sytlesheets are a truly impressive, complex project.

Their complexity brings me to the other DocBook-related news item from today, in which Bob DuCharme argues that XHTML 2:

will hit a sweet spot between the richness of DocBook and the simplicity of XHTML 1

I’m certainly hopeful that our work in the DocBook SubCommittee for Publishers will move a subset of DocBook closer to that “sweet spot”.

Partial Updates: A Simpler Strawman?

Sunday, June 10th, 2007

James Snell has been working some interesting things as the work on the Atom Publishing Protocol spec winds down. Most recently, he posted some thoughts on how to effectively communicate partial updates to APP servers using HTTP PATCH.

[UPDATE: James points out the obvious drawback to this approach in his response.]

One of the things that surprised me when I met other APP implementors at the interop was the relative lack of concern they seemed to have about the actual content inside their <atom:entry>s. This may have simply been a simplification on their part for the sake of testing (“if it can accept a single line of XHTML div it can accept anything, essentially) rather than their real views, but to someone very concerned about perfect content fidelity, it sorta scared me. These tiny <atom:entry>s might hide the some of the problems that APP will face in the wild, particularly for document repositories.

Long before the interop, we’d decided internally at O’Reilly to use the Media Resources rather than the <atom:entry> container (in large part because of the size of our DocBook documents, often over 2MB) for our document repository implementation. Because of the larger size of our content blocks, the sort of partial updates that James is thinking about might be quite cool.

The core of James’ strawman is an XML delta syntax (with credit due to Andy Roberts‘ work on the same) for HTTP PATCH with 8 operations: insert-before, insert-after, insert-child, replace, remove, remove-all, set-attribute and remove-attribute. Coming at this problem with my experience in document transformation and XSLT, I saw 7 of those operations (everything but ‘replace’) as unnecessary. The basic inspiration is thinking about each operation as an XSLT template. Mentally translate the d:replace/@path into xsl:template/@match and swap the bodies and you’ll be with me (with luck!).

Here’s the specific rundown of the 7 operations other than ‘replace’ working with James’ simple example <atom:entry>:

 1 <?xml version="1.0"?>
 2 <entry xmlns="http://www.w3.org/2005/Atom">
 3   <id>http://example.org/foo/boo</id>
 4   <title>Test</title>
 5   <updated>2007-12-12T12:12:12Z</updated>
 6   <summary>Test summary</summary>
 7   <author>
 8     <name>James</name>
 9   </author>
10   <link href="http://example.org"/>
11 </entry>

Note: You’ll have to imagine these working on a much larger XML document than my examples to understand the importance.

insert-before

 1 PATCH /collection/entry/1 HTTP/1.1
 2 Host: example.org
 3 Content-Type: application/delta+xml
 4 Content-Length: nnnn
 5 
 6 <d:delta
 7   xmlns:d="http://purl.org/atompub/delta"
 8   xmlns="http://www.w3.org/2005/Atom"
 9   xmlns:atom="http://www.w3.org/2005/Atom"
10   xmlns:b="http://example.org/foo">
11 
12   <!-- substitute for insert-before
13        /atom:entry/atom:author/atom:name
14        an atom:email -->
15   <d:replace path="/atom:entry/atom:author">
16     <atom:author>
17       <atom:email>james@example.org</atom:email>
18       <atom:name>James</atom:name>
19     </atom:author>
20   </d:replace>
21 </d:delta>

insert-after

 1 PATCH /collection/entry/1 HTTP/1.1
 2 Host: example.org
 3 Content-Type: application/delta+xml
 4 Content-Length: nnnn
 5 
 6 <d:delta
 7   xmlns:d="http://purl.org/atompub/delta"
 8   xmlns="http://www.w3.org/2005/Atom"
 9   xmlns:atom="http://www.w3.org/2005/Atom"
10   xmlns:b="http://example.org/foo">
11 
12   <!-- substitute for insert-after
13        /atom:entry/atom:author/atom:name
14        an atom:uri -->
15   <d:replace path="/atom:entry/atom:author">
16     <atom:author>
17       <atom:name>James</atom:name>
18       <atom:uri>http://example.org/blogs/james</atom:uri>
19     </atom:author>
20   </d:replace>
21 </d:delta>

insert-child

 1 PATCH /collection/entry/1 HTTP/1.1
 2 Host: example.org
 3 Content-Type: application/delta+xml
 4 Content-Length: nnnn
 5 
 6 <d:delta
 7   xmlns:d="http://purl.org/atompub/delta"
 8   xmlns="http://www.w3.org/2005/Atom"
 9   xmlns:atom="http://www.w3.org/2005/Atom"
10   xmlns:b="http://example.org/foo">
11 
12   <!-- substitute for insert-child
13        /atom:entry/atom:author
14        an atom:uri -->
15   <d:replace path="/atom:entry/atom:author">
16     <atom:author>
17       <atom:name>James</atom:name>
18       <atom:uri>http://example.org/blogs/james</atom:uri>
19     </atom:author>
20   </d:replace>
21 </d:delta>

remove

 1 PATCH /collection/entry/1 HTTP/1.1
 2 Host: example.org
 3 Content-Type: application/delta+xml
 4 Content-Length: nnnn
 5 
 6 <d:delta
 7   xmlns:d="http://purl.org/atompub/delta"
 8   xmlns="http://www.w3.org/2005/Atom"
 9   xmlns:atom="http://www.w3.org/2005/Atom"
10   xmlns:b="http://example.org/foo">
11 
12   <!-- substitute for remove
13        /atom:entry/atom:author/atom:name -->
14   <d:replace path="/atom:entry/atom:author/atom:name">
15   </d:replace>
16   <!-- yeah, this no atom:author is longer valid ..-->
17 </d:delta>

remove-all

 1 PATCH /collection/entry/1 HTTP/1.1
 2 Host: example.org
 3 Content-Type: application/delta+xml
 4 Content-Length: nnnn
 5 
 6 <d:delta
 7   xmlns:d="http://purl.org/atompub/delta"
 8   xmlns="http://www.w3.org/2005/Atom"
 9   xmlns:atom="http://www.w3.org/2005/Atom"
10   xmlns:b="http://example.org/foo">
11 
12   <!-- substitute for remove
13        /atom:entry/atom:author/atom:name -->
14   <d:replace path="/atom:entry/*">
15   </d:replace>
16   <!-- yeah, this atom:entry is no longer valid ..-->
17 </d:delta>

set-attribute

 1 PATCH /collection/entry/1 HTTP/1.1
 2 Host: example.org
 3 Content-Type: application/delta+xml
 4 Content-Length: nnnn
 5 
 6 <d:delta
 7   xmlns:d="http://purl.org/atompub/delta"
 8   xmlns="http://www.w3.org/2005/Atom"
 9   xmlns:atom="http://www.w3.org/2005/Atom"
10   xmlns:b="http://example.org/foo">
11 
12   <!-- substitute for set-attribute
13        /atom:entry/atom:link/@href 
14        to http://not-example.org -->
15   <d:replace path="/atom:entry/atom:link/@href">http://not-example.org</d:replace>
16 </d:delta>

remove-attribute

 1 PATCH /collection/entry/1 HTTP/1.1
 2 Host: example.org
 3 Content-Type: application/delta+xml
 4 Content-Length: nnnn
 5 
 6 <d:delta
 7   xmlns:d="http://purl.org/atompub/delta"
 8   xmlns="http://www.w3.org/2005/Atom"
 9   xmlns:atom="http://www.w3.org/2005/Atom"
10   xmlns:b="http://example.org/foo">
11 
12   <!-- substitute for remove-attribute
13        /atom:entry/atom:link/@href -->
14   <d:replace path="/atom:entry/atom:link">
15     <atom:link/>
16   </d:replace>
17   <!-- you can't take the easy way and match
18        the attribute, because an empty attribute
19        (@attr="") means something different than
20        the absence of @attr -->
21   <!-- and this atom:link is longer valid ..-->
22 </d:delta>

I think the above could be fairly easily implemented as a transformation into either XQuery or XSLT, but I’d imagine that it could be implemented using streaming techniques as well. Thoughts?

Borrowing Java’s XSLT Support for Ruby

Friday, March 2nd, 2007

Well, I finally caught up with the crowd and got JRuby running on one of my dev boxes. The reason I’d been interested in it from the getgo was because Ruby lacks any support for internal XSLT processing. All those system()s were starting to get me down, especially as I’m trying to get a DocBook->PDF rendering webservice to be a lot faster. Much to my surprise, I was able to get simple transforms working in almost no time (thanks in part to lots of help). Without further ado, here’s a simple library for XSLT transforms using either Xalan-J or Saxon (make sure you have the jars for both in your CLASSPATH):

require 'java'
module JXslt
  include_class "javax.xml.transform.TransformerFactory"
  include_class "javax.xml.transform.Transformer"
  include_class "javax.xml.transform.stream.StreamSource"
  include_class "javax.xml.transform.stream.StreamResult"
  include_class "java.lang.System"

  class XsltProcessor
    def transform(xslt,infile,outfile)
      transformer = @tf.newTransformer(StreamSource.new(xslt))
      transformer.transform(StreamSource.new(infile), StreamResult.new(outfile))
    end
  end # XsltProcessor
  class Saxon < XsltProcessor
    TRANSFORMER_FACTORY_IMPL = "net.sf.saxon.TransformerFactoryImpl"
    def initialize
      System.setProperty("javax.xml.transform.TransformerFactory", TRANSFORMER_FACTORY_IMPL)
      @tf = TransformerFactory.newInstance
    end
  end
  class Xalan < XsltProcessor
    TRANSFORMER_FACTORY_IMPL = "org.apache.xalan.processor.TransformerFactoryImpl"
    def initialize
      System.setProperty("javax.xml.transform.TransformerFactory", TRANSFORMER_FACTORY_IMPL)
      @tf = TransformerFactory.newInstance
    end
  end
end 

# if you wanted to run this from the command line, do something like
# $ jruby lib/jxslt.rb a.xsl in.xml out.xml
xalan = JXslt::Xalan.new
xalan.transform(*ARGV)
#saxon = JXslt::Saxon.new
#saxon.transform(*ARGV)

Big props to Charles for helping me get going and writing the first version of the above.

darcs get http://kfahlgren.com/code/lib/jxslt/ or jxslt.rb

Exploiting FrameMaker MIF as XML, Reading Bookfiles

Sunday, February 25th, 2007

[Read this for an introduction to what I'm talking about].

Now that we’ve got our FrameMaker documents in XML, how can we exploit their new format? One of the first things I did was to create new ways of reading (eventually changing) the simple data stored within them. This isn’t all that earth-shattering, but when you consider how difficult it is to find and change some values in the FrameMaker UI this is a big win. Where to start? Bookfiles.

To be able to apply stylesheets or data-collection tools to books (rather than individual files), I need to be able to collect a books components. So, convert your bookfile to MX (yeah, it works on bookfiles as well as chapter files), and search through it for one of the filenames you know is a part of the book (you’ll probably want to pretty-print the XML first). I get something like this:

  <BookComponent>
    <FileName>`&lt;c\&gt;ch01'</FileName>
    <Unique>27107</Unique>
    <StartPageSide>StartRightSide</StartPageSide>
    <PageNumbering>Restart</PageNumbering>
    <PgfNumbering>Continue</PgfNumbering>
    <PageNumPrefix>`'</PageNumPrefix>
    <PageNumSuffix>`'</PageNumSuffix>
    <DefaultPrint>Yes</DefaultPrint>
    <DefaultApply>Yes</DefaultApply>
  </BookComponent>

MX in this case has a pretty comprehensible structure, so we’ll need to grab a BookComponent/FileName, do a little text processing to remove the funky characters, and potentially append our MX file extension (I chose “.mx”). Here’s a very simple stylesheet to do just that:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="@*|node()">
    <xsl:apply-templates/>
  </xsl:template>  

  <xsl:template match="/">
    <xsl:element name="components">
      <xsl:apply-templates/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="//BookComponent/FileName">
    <xsl:param name="extension" select="'.mx'"/>
    <xsl:variable name="str-after">
      <xsl:value-of select="substring-after(., '>')"/>
    </xsl:variable>
    <xsl:element name="component">
      <xsl:value-of select="substring($str-after,
                                      1,
                                      string-length($str-after) - 1)"/>
      <xsl:value-of select="$extension"/>
    </xsl:element>
  </xsl:template>

</xsl:stylesheet>

When you run that on a MX bookfile, you should see an output like this (note that the file extension is customizable above):

<?xml version="1.0"?>
<components>
  <component>svcTOC.fm.mx</component>
  <component>foreword.mx</component>
  <component>ch00.mx</component>
  <component>ch01.mx</component>
  <component>ch02.mx</component>
  <component>ch03.mx</component>
  <component>ch04.mx</component>
  <component>ch05.mx</component>
  <component>ch06.mx</component>
  <component>ch07.mx</component>
  <component>ch08.mx</component>
  <component>ch09.mx</component>
  <component>appa.mx</component>
  <component>appb.mx</component>
  <component>appc.mx</component>
  <component>appd.mx</component>
  <component>appe.mx</component>
  <component>svcIX.fm.mx</component>
  <component>svcAPL.fm.mx</component>
  <component>svcLOR.fm.mx</component>
</components>

That’ll give us a nice structure to direct other processes to the individual component files.

The code is also available here or darcs get http://kfahlgren.com/code/mx/.

Exploiting FrameMaker MIF as XML, Back into MIF

Saturday, February 3rd, 2007

[Read this for an introduction to what I'm talking about].

The first step of doing anything useful with MX is the ability to get back out into MIF. Thankfully, this is an entirely trivial job in XSLT.

[This code thanks to my boss, Andrew Savikas.]

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <!-- author: Andrew Savikas, O'Reilly Media -->

  <xsl:output method="text" encoding="ascii"/>
  <xsl:strip-space elements="_facet"/>

  <xsl:template match="/|MIF_ROOT">
    <xsl:apply-templates/>
  </xsl:template>

<!-- This template needs to remain flush left for correct output -->
<xsl:template match="_facet">
<xsl:text>
</xsl:text>
<xsl:apply-templates/>
</xsl:template>

  <xsl:template match="*">
    <xsl:text>&lt;</xsl:text>
      <xsl:value-of select="name()"/>
    <xsl:text> </xsl:text>
    <xsl:apply-templates/>
    <xsl:text>&gt;</xsl:text>
    <xsl:text> </xsl:text>
  </xsl:template>

</xsl:stylesheet>

The code is also available here or darcs get http://kfahlgren.com/code/mx/.