REAXblog

XSL and InDesign tagged text

May 23rd 2008
6 Comments
respond
trackback

One of my first freelance jobs involved importing a massive amount of XML data into InDesign CS - which had then just shipped with highly touted XML import options. While InDesign proved capable in the end, the process itself was a disaster.

At the time, I was using InDesign’s built in XML import option. InDesign didn’t (and doesn’t) support the reading of XML attributes or performing any “logic” based on the XML stream. This forced me to use a ton of hacks and cludges in order to make the process work for something like 300 pages of automotive pricing data.

InDesign presented a few other problems as well - because it wouldn’t support attributes, we had to rewrite the XML schema to put everything in a tag in a manner that wasn’t especially semantic. Furthermore, InDesign choked on large XML files and processed them slowly. Despite breaking up the XML files in 200k chunks, InDesign still took 5-10 minutes to import each of the 15 chunks. This was costing my client lots of time and had me locked in my office on what turned out to be the best weather of the summer.

It was a few months later that in the course of another project, I picked up a little bit of XSLT, which I hadn’t understood very well up until that point. But I immediately recognized an application forr the car pricing guide: I could programatticly analyze the XML stream I received and transform it into InDesign’s XML format! It seemed too perfect - I wouldn’t have to spend endless hours setting section markers, bothering with selective page breaks, or hoping that InDesign wouldn’t crap out while processing a large XML chunk (which had been a problem).

Unfortunately, it was not to be - the InDesign XML format is completely inscrutible. I wondered if it might be easier to use an XSLT stylesheet to generate InDesign Tagged Text, and the genesis of my project truly began.

Not an XML format

The biggest hurdle I had to overcome with InDesign is that it’s not an XML format, meaning I needed to use some tricky CDATA tags in order to generate the text I needed. Secondly, it’s obviously tagged text, meaning you won’t be able to automagically insert images like you might with a straight XML conversion within InDesign.

I was willing to live without images because it wasn’t needed for my project. I’ll revisit this later and determine if there is a relatively painless way to include images.

The first issue (a non-XML output format) can be overcome with a little bit of patience. Although XSL transformation tools are generally just happy to generate non-XML, the fact that InDesign Tagged Text (.intt) uses angle-brackets throws a kink into the works. So we use CDATA tags liberally to echo raw text when we need to.

OK - enough babble. How does INTT work? On first glance, it looks somewhat like HTML, with tags which look like <ParaStyle:Header1> and so on. Unfortunately, the similarity ends there. Paragraphy Styles are never ‘closed’, but new paragraphs are marked with a new <ParaStyle:…> tag. Character Styles, however, are opened with a <CharStyle:Emphasis> type tag and closed with a tag just like <CharStyle:>.

There are also table tags (which are mercifully part of the text flow in InDesign and thus we can generate them into our tagged text), which I will discuss at a later date.

But my project just used paragraph styles and character styles, so I was set. I also wanted to do some logical processing (like transforming a pricing number like 25550 into $25,550, or 1200000 into $1.2 mil) that an InDesign XML import can’t do, which involved a small amount of extra work.

Now, you remember I mentioned that those angle brackets are a problem? They are. XSL transform tools will see the brackets and assume you are outputting valid XML, not the crazy nested bracket taggs Indy uses. There are two workarounds:

Workaround one - encode all angle brackets you wish to output using their HTML entity references. It can be painful to do a search and replace here, because in your XSL sheet, you’ll have the processing instructions in valid XML and you don’t want to encode those brackets.

The other workaround is to encode everything using CDATA tags. They look like this: <![CDATA[...]]>. Any and all text you put in the inner set of brackets will put output directly. It’s convenient, but can get ugly when you are dealing with specific tab stops and several character styles in one line of output - your XSL sheet will be downright unreadable to anyone you pass it along to.

There are a few other peices of housekeeping you’ll need to do before you are ready to go. For some reason, InDesign only recognizes text files with specific encoding as InDesign tagged text (I suspect this is to prevent non-InDesign applications from generating InDesign tagged-text willy nilly) and to get the processor to output your data properly. I start my XSL stylesheet with this:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl = "http://www.w3.org/1999/XSL/Transform“>
<xsl:output method = “text” encoding = “windows-1252″ />

The first and second lines are required at the head of any XSL stylesheet. Setting the output method to text will translate any of your encoded entities (brackets, soft returns, tabs, etc) back into their non-encoded versions in the output file, and the encoding will make sure that your file is recognizes as an InDesign tagged text file.

The next bit is your InDesign tagged text header. Here’s mine:

<ANSI-WIN>
<Version:5><FeatureSet:InDesign-Roman><ColorTable:=<PANTONE 201 C:COLOR:CMYK:Spot:0,1,0.63,0.29><Black:COLOR:CMYK:Process:0,0,0,1>>
<DefineCharStyle:aPipe=<Nextstyle:aPipe><cTypeface:Regular><cSize:5.500000><cTracking:50><cBaselineShift:0.500000><cFont:Blue Highway><cColorTint:50.000000>>
<DefineCharStyle:aCar Info Header Numbers=<Nextstyle:aCar Info Header Numbers>>
<DefineCharStyle:aCar Info Prices=<Nextstyle:aCar Info Prices><cTypeface:Regular><cSize:8.000000><cTracking:0><cLeading:7.800000><cFont:Blue Highway>>
<DefineCharStyle:aNotes=<Nextstyle:aNotes><cTypeface:Italic><cSize:7.000000><cHorizontalScale:1.000000><cAutoPairKern:Metrics><cLigatures:0><cTracking:0><cBaselineShift:0.000000><cCase:Small Caps><cKerning:1e\+08><cVerticalScale:1.000000><cLeading:8.300000><cLanguage:English\: USA><cNoBreak:0><cUnderline:0><cFont:Arial Narrow><cPosition:Normal><cStrikethru:0><cSkew:0.000000>>
<DefineCharStyle:Half-Number=<Nextstyle:Half-Number><cLigatures:0><cFont:Arial><cPosition:Normal><cOTFContAlt:0>>
<DefineParaStyle:NormalParagraphStyle=<Nextstyle:NormalParagraphStyle><cSize:8.000000><cFont:Blue Highway><bulFont:\<TextFont\>><bulTypeFace:\<TextStyle\>>>
<DefineParaStyle:aAutomaker=<BasedOn:NormalParagraphStyle><Nextstyle:aAutomaker><cColor:PANTONE 201 C><cSize:8.199999><cTracking:250><cCase:All Caps><cVerticalScale:1.250000><pSpaceBefore:4.500000><pTabRuler:117.89999999999999\,Right\,.\,0\,\;146.70000000000002\,Right\,.\,0\,\;175.5\,Right\,.\,0\,\;203.4\,Right\,.\,0\,\;><pKeepWithNext:2><pTextAlignment:AwayFromSpine><bulFont:\<TextFont\>><bulTypeFace:\<TextStyle\>>>
<DefineParaStyle:aCar Info Header=<BasedOn:aAutomaker><Nextstyle:aCar Info Header><cColor:Black><cSize:7.000000><cTracking:0><cVerticalScale:1.100000><pSpaceBefore:0.000000><cColorTint:80.000000><pRuleBelowColor:Black><pRuleBelowStroke:0.250000><pRuleBelowOffset:0.720000><bulFont:\<TextFont\>><bulTypeFace:\<TextStyle\>>>
<DefineParaStyle:aCar Info=<BasedOn:aCar Info Header><Nextstyle:aCar Info><cSize:8.199999><cHorizontalScale:0.900000><cTracking:25><cCase:Normal><pLeftIndent:18.000000><pFirstLineIndent:-18.000000><cLeading:7.800000><pTabRuler:121.5\,Right\,.\,0\,\;150.29999999999998\,Right\,.\,0\,\;179.1\,Right\,.\,0\,\;207\,Right\,.\,0\,\;><pKeepParaTogether:1><pKeepWithNext:0><pKeepLines:1><cColorTint:100.000000><pRuleAboveColor:Black><pRuleAboveStroke:0.250000><pRuleAboveTint:80.000000><pRuleAboveOffset:6.000000><pRuleBelowTint:80.000000><pRuleBelowOffset:1.080000><pRuleAboveOn:1><pRuleBelowOn:0><pRuleAboveStrokeType:JapaneseDots><pRuleBelowStrokeType:JapaneseDots><pTextAlignment:Left><bulFont:\<TextFont\>><bulTypeFace:\<TextStyle\>>>
<DefineParaStyle:aCar Sub Info=<BasedOn:aCar Info Header><Nextstyle:aCar Sub Info><cHorizontalScale:0.900000><cTracking:25><cCase:Normal><cKerning:-25><pLeftIndent:27.000000><pFirstLineIndent:-9.000000><pTabRuler:121.5\,Right\,.\,0\,\;150.29999999999998\,Right\,.\,0\,\;179.1\,Right\,.\,0\,\;207\,Right\,.\,0\,\;><pKeepParaTogether:1><pKeepWithNext:0><pKeepLines:1><cColorTint:100.000000><pRuleAboveColor:Black><pRuleAboveStroke:0.250000><pRuleAboveTint:80.000000><pRuleAboveOffset:6.000000><pRuleBelowOffset:1.080000><pRuleBelowMode:Text><pRuleAboveOn:1><pRuleBelowOn:0><pRuleAboveStrokeType:JapaneseDots><pRuleBelowStrokeType:JapaneseDots><pTextAlignment:Left><bulFont:\<TextFont\>><bulTypeFace:\<TextStyle\>>>
<DefineParaStyle:aNotes=<BasedOn:NormalParagraphStyle><Nextstyle:aNotes><cTypeface:Italic><cSize:7.000000><cCase:Small Caps><pLeftIndent:54.000000><cLeading:8.300000><pHyphenation:0><cFont:Arial Narrow><pKeepParaTogether:1><pKeepLines:1><pBalanceLines:1><pTextAlignment:Right><bulFont:\<TextFont\>><bulTypeFace:\<TextStyle\>>>
<DefineParaStyle:aBuzz=<BasedOn:aNotes><Nextstyle:aBuzz><cColor:PANTONE 201 C><bulFont:\<TextFont\>><bulTypeFace:\<TextStyle\>>>

You could bother to understand all that, and generate it in your stylesheet, but it’s not worth your time. What I did was set up my InDesign file with some sample text, style everything properly, and then export that as tagged text. Then I simply copied that header out of that file, wrapped the entire thing in a CDATA tag, and instructed the XSL stylesheet to echo that on the root xml element. An upside of this is that I could place my final tagged text into an arbitrary InDesign document and it would bring along the paragraph and character styles with it.

From here, it’s just a matter of writing your XSL stylesheet to output the text you need.

6 Comments

  1. J

    This is the first time I’ve seen someone say that InDesign’s XML handling is slow. I thought I was the only one! There’s a reason Adobe uses a cookery recipe for their XML demos - it’s nice and simple!

  2. Ronny Adak

    Hi,

    We faced a problem while displaying the values of XML atrributes in InDesign, we dicussed it with Technical team of 3B2 and TeX, they do it via applying XSLT. But it seems quite difficult in InDesign, its showing DOM Error…

    With Regards
    Ronny Adak

  3. Jim Hanifen

    Thank you for your post. This was helpful. I am still have problems with the xml import into indesign. I really am excited for the next version to come out, CS3 seems to be very very picky when it come to doing anything with the xml. I am not sure why Adobe can not use the same great technology in Dreamweaver and apply it to Indesign.

    Thanks Again.

  4. Narration

    Reaxion writer, thank you kindly for the kind of real-world experience write-up that is very useful to others when shared.

    Such good memories of Portland - and the summer weather is definitely when you don’t want to be locked in the office.

    Best wishes…archtecture I think can be a fascinating direction.

  5. I found your site on technorati and read a few of your other posts. Keep up the good work. I just added your RSS feed to my Google News Reader. Looking forward to reading more from you down the road!

  6. Your blog is interesting!

    Keep up the good work!

Incoming Links

Leave a Reply