You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
69 lines
2.4 KiB
69 lines
2.4 KiB
15 years ago
|
<refentry id="make.index.markup">
|
||
|
<refmeta>
|
||
|
<refentrytitle>make.index.markup</refentrytitle>
|
||
|
<refmiscinfo role="type">boolean</refmiscinfo>
|
||
|
</refmeta>
|
||
|
<refnamediv>
|
||
|
<refname>make.index.markup</refname>
|
||
|
<refpurpose>Generate XML index markup in the index?</refpurpose>
|
||
|
</refnamediv>
|
||
|
|
||
|
<refsynopsisdiv>
|
||
|
<src:fragment id='make.index.markup.frag'>
|
||
|
<xsl:param name="make.index.markup" select="0"/>
|
||
|
</src:fragment>
|
||
|
</refsynopsisdiv>
|
||
|
|
||
|
<refsect1><title>Description</title>
|
||
|
|
||
|
<para>This parameter enables a very neat trick for getting properly
|
||
|
merged, collated back-of-the-book indexes. G. Ken Holman suggested
|
||
|
this trick at Extreme Markup Languages 2002 and I'm indebted to him
|
||
|
for it.</para>
|
||
|
|
||
|
<para>Jeni Tennison's excellent code in
|
||
|
<filename>autoidx.xsl</filename> does a great job of merging and
|
||
|
sorting <sgmltag>indexterm</sgmltag>s in the document and building a
|
||
|
back-of-the-book index. However, there's one thing that it cannot
|
||
|
reasonably be expected to do: merge page numbers into ranges. (I would
|
||
|
not have thought that it could collate and suppress duplicate page
|
||
|
numbers, but in fact it appears to manage that task somehow.)</para>
|
||
|
|
||
|
<para>Ken's trick is to produce a document in which the index at the
|
||
|
back of the book is <quote>displayed</quote> in XML. Because the index
|
||
|
is generated by the FO processor, all of the page numbers have been resolved.
|
||
|
It's a bit hard to explain, but what it boils down to is that instead of having
|
||
|
an index at the back of the book that looks like this:</para>
|
||
|
|
||
|
<blockquote>
|
||
|
<formalpara><title>A</title>
|
||
|
<para>ap1, 1, 2, 3</para>
|
||
|
</formalpara>
|
||
|
</blockquote>
|
||
|
|
||
|
<para>you get one that looks like this:</para>
|
||
|
|
||
|
<blockquote>
|
||
|
<programlisting><![CDATA[<indexdiv>A</indexdiv>
|
||
|
<indexentry>
|
||
|
<primaryie>ap1</primaryie>,
|
||
|
<phrase role="pageno">1</phrase>,
|
||
|
<phrase role="pageno">2</phrase>,
|
||
|
<phrase role="pageno">3</phrase>
|
||
|
</indexentry>]]></programlisting>
|
||
|
</blockquote>
|
||
|
|
||
|
<para>After building a PDF file with this sort of odd-looking index, you can
|
||
|
extract the text from the PDF file and the result is a proper index expressed in
|
||
|
XML.</para>
|
||
|
|
||
|
<para>Now you have data that's amenable to processing and a simple Perl script
|
||
|
(such as <filename>fo/pdf2index</filename>) can
|
||
|
merge page ranges and generate a proper index.</para>
|
||
|
|
||
|
<para>Finally, reformat your original document using this literal index instead of
|
||
|
an automatically generated one and <quote>bingo</quote>!</para>
|
||
|
|
||
|
</refsect1>
|
||
|
</refentry>
|