<?xml version="1.0" encoding="UTF-8"?>
<?sdop xref_rgb="0,0,1"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<book>
<bookinfo>
<title>The xfpt plain text to XML processor</title>
<titleabbrev>xfpt</titleabbrev>
<date>22 March 2007</date>
<author>
  <firstname>Philip</firstname>
  <surname>Hazel</surname>
</author>
<authorinitials>PH</authorinitials>
<revhistory><revision><revnumber>0.01</revnumber><date>22 March 2007</date><authorinitials>PH</authorinitials></revision></revhistory>
<copyright><year>2007</year><holder>University of Cambridge</holder></copyright>
</bookinfo>
<chapter>
<title>Introduction</title>
<para>
<emphasis>xfpt</emphasis> is a program that reads a marked-up ASCII source file, and converts it into
XML. It was written with DocBook XML in mind, but can also be used for other
forms of XML. Unlike <emphasis>AsciiDoc</emphasis> (<emphasis role="bold"><ulink url="http://www.methods.co.nz/asciidoc/">http://www.methods.co.nz/asciidoc/</ulink></emphasis>),
<emphasis>xfpt</emphasis> does not try to produce XML from a document that is also usable as a
freestanding ASCII document. The input for <emphasis>xfpt</emphasis> is very definitely <quote>marked
up</quote>. This makes it less ambiguous for large and/or complicated documents. <emphasis>xfpt</emphasis>
is also much faster than <emphasis>AsciiDoc</emphasis> because it is written in C and does not
rely on pattern matching.
</para>
<para>
<emphasis>xfpt</emphasis> is aimed at users who understand the XML that they are generating. It makes
it easy to include literal XML, either in blocks, or within paragraphs. <emphasis>xfpt</emphasis>
restricts itself to two special characters that trigger all its processing.
</para>
<para>
<emphasis>xfpt</emphasis> treats any input line that starts with a dot as a <emphasis>directive</emphasis> line.
Directives control the way the input is processed. A small number of directives
are implemented in the program itself. A macro facility makes it possible to
combine these in various ways to define directives for higher-level concepts
such as chapters and sections. A standard macro library that generates a simple
subset of DocBook XML is provided. The only XML element that the program itself
generates is <literal>&lt;para&gt;</literal>; all the others must be included as literal XML, either
directly in the input text, or, more commonly, as part of the text that is
generated by a macro call.
</para>
<para>
The ampersand character is special within non-literal text that is processed by
<emphasis>xfpt</emphasis>. An ampersand introduces a <emphasis>flag sequence</emphasis> that modifies the output.
Ampersand was chosen because it is also special in XML. As well as recognizing
flag sequences that begin with an ampersand, <emphasis>xfpt</emphasis> converts grave accents and
apostrophes that appear in non-literal text into typographic opening and
closing quotes, as follows:
</para>
<literallayout>
 <literal>&nbsp;&#x60;    </literal>  becomes &nbsp;&#x2018;
 <literal>&nbsp;&#x27;    </literal>  becomes &nbsp;&#x2019;
</literallayout>
<para>
Within normal input text, ampersand, grave accent, and apostrophe are the only
characters that cause <emphasis>xfpt</emphasis> to change the input text, and this applies only to
non-literal text. In literal text, there are no markup characters, and only a
dot at the start of a line is recognized as special. Within the body of a
macro, there is one more special character: the dollar character is used to
introduce an argument substitution.
</para>
<para>
Notwithstanding the previous paragraph, <emphasis>xfpt</emphasis> knows that it is generating XML,
and in all cases when a literal ampersand or angle bracket is required in the
output, the appropriate XML entity reference (<literal>&amp;amp;</literal>, <literal>&amp;lt;</literal>, or
<literal>&amp;gt;</literal>, respectively) is generated.
</para>
<section>
<title>The <emphasis>xfpt</emphasis> command line</title>
<para>
The format of the <emphasis>xfpt</emphasis> command line is:
</para>
<literallayout>
 <literal>xfpt [</literal><emphasis>options</emphasis><literal>] [</literal><emphasis>input source</emphasis><literal>]</literal>
</literallayout>
<para>
If no input is specified, the standard input is read. There are three options:
</para>
<literallayout>
 <literal>-help</literal>
</literallayout>
<para>
This option causes <emphasis>xfpt</emphasis> to output its <quote>usage</quote> message, and exit.  
</para>
<literallayout>
 <literal>-o </literal><emphasis>&lt;output destination&gt;</emphasis>
</literallayout>
<para>
This option overrides the default destination. If the standard input is being
read, the default destination is the standard output. Otherwise, the default
destination is the name of the input file with the extension <filename>.xml</filename>,
replacing its existing extension if there is one. A single hyphen character can
be given as an output destination to refer to the standard output.
</para>
<literallayout>
  <literal>-S </literal><emphasis>&lt;directory path&gt;</emphasis>
</literallayout>
<para>
This option overrides the path to <emphasis>xfpt</emphasis>&#x2019;s library directory that is built into
the program. This makes it possible to use or test alternate libraries.
</para>
</section>
<section>
<title>A short <emphasis>xfpt</emphasis> example</title>
<para>
Here is a very short example of a complete <emphasis>xfpt</emphasis> input file that uses some of the
standard macros and flags:
</para>
<literallayout class="monospaced">
 .include stdflags
 .include stdmacs
 .docbook
 .book

 .chapter "The first chapter"
 This is the text of the first chapter. Here is an &amp;'italic'&amp;
 word, and here is a &amp;*bold*&amp; one.

 .section "This is a section heading"
 We can use the &amp;*ilist*&amp; macro to generate an itemized list:
 .ilist
 The first item in the list.
 .next
 The last item in the list.
 .endlist

 There are also standard macros for ordered lists, literal
 layout blocks, code blocks, URL references, index entries
 and tables.
</literallayout>
</section>
<section id="SECTliteralprocessing">
<title>Literal and non-literal processing</title>
<para>
<emphasis>xfpt</emphasis> processes non-directive input lines in one of four ways (known as
<quote>modes</quote>):
</para>
<itemizedlist>
<listitem>
<para>
In the default mode, text is processed paragraph by paragraph. The end of a
paragraph is indicated by the end of the input, a blank line, or by an
occurrence of the <emphasis role="bold">.literal</emphasis> directive. Other directives (for example,
<emphasis role="bold">.include</emphasis>) do not of themselves terminate a paragraph. Most of the standard
macros (such as <emphasis role="bold">.chapter</emphasis> and <emphasis role="bold">.section</emphasis>) force a paragraph end by
starting their contents with a <emphasis role="bold">.literal</emphasis> directive.
</para>
<para>
Because <emphasis>xfpt</emphasis> reads a whole paragraph before processing it, error messages
contain the phrase <quote>detected near line <emphasis>nnn</emphasis></quote>, where the line number is
typically that of the last line of the paragraph.
</para>
</listitem>
<listitem>
<para>
In the <quote>literal layout</quote> mode, text is processed line by line, but is
otherwise handled as in the default mode. The only real difference this makes
to the markup from the user&#x2019;s point of view is that paired flags must be on the
same line. In this mode, error messages are more likely to contain the exact
line number where the fault lies. Literal layout mode is used by the standard
<emphasis role="bold">.display</emphasis> macro to generate <literal>&lt;literallayout&gt;</literal> elements.
</para>
</listitem>
<listitem>
<para>
In the <quote>literal text</quote> mode, text is also processed line by line, but no flags
are recognized. The only modification <emphasis>xfpt</emphasis> makes to the text is to turn
ampersand and angle bracket characters into XML entity references. This mode is
used by the standard <emphasis role="bold">.code</emphasis> macro to generate <literal>&lt;literallayout&gt;</literal> elements
that include <literal>class=monospaced</literal>.
</para>
</listitem>
<listitem>
<para>
In the <quote>literal XML</quote> mode, text lines are copied to the output without
modification. This is the easiest way to include a chunk of literal XML in the
output. An example might be the <literal>&lt;bookinfo&gt;</literal> element, which occurs only once
in a document. It is not worth setting up a macro for a one-off item like this.
</para>
</listitem>
</itemizedlist>
<para>
The <emphasis role="bold">.literal</emphasis> directive switches between the modes. It is not normally used
directly, but instead is incorported into appropriate macro definitions.
</para>
<para>
Directive lines are recognized and acted upon in all four modes. However, an
unrecognized line that starts with a dot in the literal text or literal XML
mode is treated as data. In the other modes, such a line provokes an error.
</para>
<para>
If you need to have a data line that begins with a dot in literal layout mode,
you can either specify it by character number, or precede it with some
non-acting markup. These two examples are both valid:
</para>
<literallayout class="monospaced">
 &amp;#x2e;start with a dot
 &amp;''&amp;.start with a dot
</literallayout>
<para>
The second example assumes the standard flags are defined: it precedes the dot 
with an empty italic string. However, this is untidy because the empty string
will be carried over into the XML.
</para>
<para>
In literal text or literal XML mode, it is not possible to have a data line
that starts with a dot followed by the name of a directive or macro. You have
to use literal layout mode if you require such output. Another solution, which
is used in the source for this document (where many examples show directive
lines), is to indent every displayed line by one space, and thereby avoid the
problem altogether.
</para>
</section>
<section>
<title>Format of directive lines</title>
<para>
If an input line starts with a dot followed by a space, it is ignored by <emphasis>xfpt</emphasis>.
This provides a facility for including comments in the input. Otherwise, the
dot must be followed by a directive or macro name, and possibly one or more
arguments. Arguments that are strings are delimited by white space unless they
are enclosed in single or double quotes. The delimiting quote character can be
included within a quoted string by doubling it. Here are some examples:
</para>
<literallayout class="monospaced">
 .literal layout
 .set version 0.00
 .row "Jack's house" 'Jill''s house'
</literallayout>
<para>
An unrecognized directive line normally causes an error; however, in the
literal text and literal XML modes, an unrecognized line that starts with a
dot is treated as a data line.
</para>
</section>
<section id="SECTcallingmacro">
<title>Calling macros</title>
<para>
Macros are defined by the <emphasis role="bold">.macro</emphasis> directive, which is described in section
<xref linkend="SECTmacro"/>. There are two ways of calling a macro. It can be called in the
same way as a directive, or it can be called from within text that is being
processed. The second case is called an <quote>inline macro call</quote>.
</para>
<para>
When a macro is called as a directive, its name is given after a dot at the
start of a line, and the name may be followed by any number of optional
arguments, in the same way as a built-in directive (see the previous section).
For example:
</para>
<literallayout class="monospaced">
 .chapter "Chapter title" chapter-reference
</literallayout>
<para>
The contents of the macro, after argument substitution, are processed in
exactly the same way as normal input lines. A macro that is called as a 
directive may contain nested macro calls.
</para>
<para>
When a macro is called from within a text string, its name is given after an
ampersand, and is followed by an opening parenthesis. Arguments, delimited by
commas, can then follow, up to a closing parenthesis. If an argument contains a
comma or a closing parenthesis, it must be quoted. White space after a
separating comma is ignored. The most common example of this type of macro
call is the standard macro for generating a URL reference:
</para>
<literallayout class="monospaced">
 Refer to a URL via &amp;url(http://x.example,this text).
</literallayout>
<para>
There are differences in the behaviour of macros, depending on which way they
are called. A macro that is called inline may not contain references to other
macros; it must contain only text lines and calls to built-in directives.
Also, newlines that terminate text lines within the macro are not included in
the output.
</para>
<para>
A macro that can be called inline can always be called as a directive, but the
opposite is not always true. Macros are usually designed to be called either
one way or the other. However, the <emphasis role="bold">.new</emphasis> and <emphasis role="bold">.index</emphasis> macros in the
standard library are examples of macros that are designed be called either way.
</para>
</section>
</chapter>

<chapter>
<title>Flag sequences</title>
<para>
Only one flag sequence is built-into the code itself. If an input line ends
with three ampersands (ignoring trailing white space), the ampersands are
removed, and the next input line, with any leading white space removed, is
joined to the original line. This happens before any other processing, and may
involve any number of lines. Thus:
</para>
<literallayout>
 <literal>The quick &amp;&amp;&amp;</literal>
 <literal>    brown &amp;&amp;&amp;</literal>
 <literal>      fox.</literal>
</literallayout>
<para>
produces exactly the same output as:
</para>
<literallayout class="monospaced">
 The quick brown fox.
</literallayout>
<section>
<title>Flag sequences for XML entities and <emphasis>xfpt</emphasis> variables</title>
<para>
If an ampersand is followed by a # character, a number, and a semicolon, it is
understood as a numerical reference to an XML entity, and is passed through
unmodified. The number can be decimal, or hexadecimal preceded by <literal>x</literal>. For
example:
</para>
<literallayout class="monospaced">
 This is an Ohm sign: &amp;#x2126;.
 This is a degree sign: &amp;#176;.
</literallayout>
<para>
If an ampersand is followed by a letter, a sequence of letters, digits, and
dots is read. If this is terminated by a semicolon, the characters between the
ampersand and the semicolon are interpreted as an entity name. This can be:
</para>
<itemizedlist>
<listitem>
<para>
The name of an inbuilt <emphasis>xfpt</emphasis> variable. At present, there is only one of these,
called <literal>xfpt.rev</literal>. Its use is described with the <emphasis role="bold">.revision</emphasis> directive
below.
</para>
</listitem>
<listitem>
<para>
The name of a user variable that has been set by the <emphasis role="bold">.set</emphasis> directive, also
described below.
</para>
</listitem>
<listitem>
<para>
The name of an XML entity. This is assumed if the name is not recognized as one
of the previous types. In this case, the input text is passed to the output
without modification. For example:
</para>
<literallayout class="monospaced">
 This is an Ohm sign: &amp;Ohm;.
</literallayout>
</listitem>
</itemizedlist>
</section>
<section>
<title>Flag sequences for calling macros</title>
<para>
If an ampersand is followed by a sequence of alphanumeric characters starting
with a letter, terminated by an opening parenthesis, the characters between the
ampersand and the parenthesis are interpreted as the name of a macro. See
section <xref linkend="SECTcallingmacro"/> for more details.
</para>
</section>
<section>
<title>Other flag sequences</title>
<para>
Any other flag sequences that are needed must be defined by means of the
<emphasis role="bold">.flag</emphasis> directive. These are of two types, standalone and paired. Both cases
define replacement text. This is always literal; it is not itself scanned for
flag occurrences.
</para>
<para>
Lines are scanned from left to right when flags are being interpreted. If
there is any ambiguity when a text string is being scanned, the longest flag
sequence wins. Thus, it is possible (as in the standard flag sequences) to
define both <literal>&amp;&lt;</literal> and <literal>&amp;&lt;&lt;</literal> as flags, provided that you never want to
follow the first of them with a <literal>&lt;</literal> character.
</para>
<para>
You can define flags that start with <literal>&amp;#</literal>, but these must be used with care,
lest they be misinterpreted as numerical references to XML entities.
</para>
<para>
A standalone flag consists of an ampersand followed by any number of
non-alphanumeric characters. When it is encountered, it is replaced by its
replacement text. For example, in the standard flag definitions, <literal>&amp;&amp;</literal>
is defined as a standalone flag with with the replacement text <literal>&amp;amp;</literal>.
</para>
<para>
A paired flag is defined as two sequences. The first takes the same form as a
standalone flag. The second also consists of non-alphanumeric characters, but
need not start with an ampersand. It is often defined as the reverse of the
first sequence. When the first sequence of a paired flag is encountered, its
partner is expected to be found within the same paragraph (or line, in literal
layout mode). Furthermore, multiple occurrences of paired flags must be
correctly nested. For example, in the standard definitions, <literal>&amp;&#x27;</literal> and
<literal>&#x27;&amp;</literal> are defined as a flag pair for enclosing text in an <literal>&lt;emphasis&gt;</literal>
element. Each member of the pair is replaced by its replacement text.
</para>
<para>
Note that, though <emphasis>xfpt</emphasis> diagnoses an error for badly nested flag pairs, it does
not prevent you from generating invalid XML. For example, DocBook does not
allow <literal>&lt;emphasis&gt;</literal> within <literal>&lt;literal&gt;</literal>, though it does allow <literal>&lt;literal&gt;</literal>
within <literal>&lt;emphasis&gt;</literal>.
</para>
</section>
<section>
<title>Unrecognized flag sequences</title>
<para>
If an ampersand is not followed by a character sequence in one of the forms
described in the preceding sections, an error occurs.
</para>
</section>
<section>
<title>Standard flag sequences</title>
<para>
These are the standalone flag sequences that are defined in the <filename>stdflags</filename>
file in the <emphasis>xfpt</emphasis> library:
</para>
<literallayout>
 <literal>&amp;&amp;        </literal> becomes <literal> &amp;amp;</literal> (ampersand)
 <literal>&amp;--       </literal> becomes <literal> &amp;ndash;</literal> (en-dash)
 <literal>&amp;~        </literal> becomes <literal> &amp;nbsp;</literal> (&#x2018;hard&#x2019; space)
</literallayout>
<para>
These are the flag pairs that are defined in the <filename>stdflags</filename> file in the <emphasis>xfpt</emphasis>
library:
</para>
<literallayout>
 <literal>&amp;"..."&amp;   </literal> becomes <literal>&lt;quote&gt;...&lt;/quote&gt;</literal>
 <literal>&amp;&#x27;...&#x27;&amp;   </literal> becomes <literal>&lt;emphasis&gt;...&lt;/emphasis&gt;</literal>
 <literal>&amp;*...*&amp;   </literal> becomes <literal>&lt;emphasis role="bold"&gt;...&lt;/emphasis&gt;</literal>
 <literal>&amp;&#x60;...&#x60;&amp;   </literal> becomes <literal>&lt;literal&gt;...&lt;/literal&gt;</literal>
 <literal>&amp;_..._&amp;   </literal> becomes <literal>&lt;filename&gt;...&lt;/filename&gt;</literal>
 <literal>&amp;(...)&amp;   </literal> becomes <literal>&lt;command&gt;...&lt;/command&gt;</literal>
 <literal>&amp;[...]&amp;   </literal> becomes <literal>&lt;function&gt;...&lt;/function&gt;</literal>
 <literal>&amp;%...%&amp;   </literal> becomes <literal>&lt;option&gt;...&lt;/option&gt;</literal>
 <literal>&amp;$...$&amp;   </literal> becomes <literal>&lt;varname&gt;...&lt;/varname&gt;</literal>
 <literal>&amp;&lt;...&gt;&amp;   </literal> becomes <literal>&lt;...&gt;</literal>
 <literal>&amp;&lt;&lt;...&gt;&gt;&amp; </literal> becomes <literal>&lt;xref linkend="..."/&gt;</literal>
</literallayout>
<para>
For example, if you want to include a literal XML element in your output, you
can do it like this: <literal>&amp;&lt;element&gt;&amp;</literal>. If you want to include a longer
sequence of literal XML, changing to the literal XML mode may be more
convenient.
</para>
</section>
</chapter>

<chapter>
<title>Built-in directive processing</title>
<para>
The directives that are built into the code of <emphasis>xfpt</emphasis> are now described in
alphabetical order. You can see more examples of their use in the definitions
of the standard macros in chapter <xref linkend="CHAPstdmac"/>.
</para>
<section>
<title>The <emphasis role="bold">.arg</emphasis> directive</title>
<para>
This directive may appear only within the body of a macro. It must be followed
by a single number, optionally preceded by a minus sign. If the number is
positive (no minus sign), subsequent lines, up to a <emphasis role="bold">.endarg</emphasis> directive, are
skipped unless the macro has been called with at least that number of
arguments and the given argument is not an empty string. If the number is
negative (minus sign present), subsequent lines are skipped if the macro has
been called with fewer than that number of arguments, or with an empty string 
for the given argument. For example:
</para>
<literallayout class="monospaced">
 .macro example
 .arg 2
 Use these lines if there are at least 2 arguments 
 and the second one is not empty. Normally there would 
 be a reference to the 2nd argument. 
 .endarg
 .arg -2 
 Use this line unless there are at least 2 arguments 
 and the second one is not empty. 
 .endarg 
 .endmacro
</literallayout>
</section>
<section>
<title>The <emphasis role="bold">.eacharg</emphasis> directive</title>
<para>
This directive may appear only within the body of a macro. It may optionally be
followed by a single number; if omitted the value is taken to be 1. Subsequent
lines, up to a <emphasis role="bold">.endeach</emphasis> directive, are processed multiple times, once for
each remaining argument. Unlike <emphasis role="bold">.arg</emphasis>, an argument that is an empty string 
is not treated specially.
</para>
<para>
The number given with <emphasis role="bold">.eacharg</emphasis> defines which argument to start with. While
these lines are being processed, the remaining macro arguments can be
referenced relative to the current argument. <literal>$+1</literal> refers to the current
argument, <literal>$+2</literal> to the next argument, and so on.
</para>
<para>
The <emphasis role="bold">.endeach</emphasis> directive may also be followed by a number, again defaulting
to 1. When <emphasis role="bold">.endeach</emphasis> is reached, the current argument number is incremented
by that number. If there are still unused arguments available, the lines
between <emphasis role="bold">.eacharg</emphasis> and <emphasis role="bold">.endeach</emphasis> are processed again.
</para>
<para>
This example is taken from the coding for the standard <emphasis role="bold">.row</emphasis> macro, which
generates an <literal>&lt;entry&gt;</literal> element for each of its arguments:
</para>
<literallayout class="monospaced">
 .eacharg
 &amp;&lt;entry&gt;&amp;$+1&amp;&lt;/entry&gt;&amp;
 .endeach
</literallayout>
<para>
This example is taken from the coding for the standard <emphasis role="bold">.itable</emphasis> macro, which
processes arguments in pairs to define the table&#x2019;s columns, starting from the
fifth argument:
</para>
<literallayout class="monospaced">
 .eacharg 5
 &amp;&lt;colspec colwidth="$+1" align="$+2"/&gt;&amp;
 .endeach 2
</literallayout>
</section>
<section>
<title>The <emphasis role="bold">.echo</emphasis> directive</title>
<para>
This directive takes a single string argument. It writes it to the standard
error stream. Within a macro, argument substitution takes place, but no other
processing is done on the string. This directive can be useful for debugging
macros or writing comments to the user.
</para>
</section>
<section>
<title>The <emphasis role="bold">.endarg</emphasis> directive</title>
<para>
See the description of <emphasis role="bold">.arg</emphasis> above.
</para>
</section>
<section>
<title>The <emphasis role="bold">.endeach</emphasis> directive</title>
<para>
See the description of <emphasis role="bold">.eacharg</emphasis> above.
</para>
</section>
<section>
<title>The <emphasis role="bold">.flag</emphasis> directive</title>
<para>
This directive is used to define flag sequences. The directive must be followed
either by a standalone flag sequence and one string in quotes, or by a flag
pair and two strings in quotes. White space separates these items. For example:
</para>
<literallayout class="monospaced">
 .flag &amp;&amp; "&amp;amp;"
 .flag &amp;" "&amp;  "&lt;quote&gt;"  "&lt;/quote&gt;"
</literallayout>
<para>
There are more examples in the definitions of the standard flags. If you
redefine an existing flag, the new definition overrides the old. There is no
way to revert to the previous definition.
</para>
</section>
<section>
<title>The <emphasis role="bold">.include</emphasis> directive</title>
<para>
This directive must be followed by a single string argument that is the path to
a file. The contents of the file are read and incorporated into the input at
this point. If the string does not contain any slashes, the path to the <emphasis>xfpt</emphasis>
library is prepended. Otherwise, the path is used unaltered.
</para>
</section>
<section>
<title>The <emphasis role="bold">.literal</emphasis> directive</title>
<para>
This must be followed by one of the words <quote>layout</quote>, <quote>text</quote>, <quote>off</quote>, or
<quote>xml</quote>. It forces an end to a previous paragraph, if there is one, and then
switches between processing modes. The default mode is the <quote>off</quote> mode, in
which text is processed paragraph by paragraph, and flags are recognized.
Section <xref linkend="SECTliteralprocessing"/> describes how input lines are processed in
the four modes.
</para>
</section>
<section id="SECTmacro">
<title>The <emphasis role="bold">.macro</emphasis> directive</title>
<para>
This directive is used to define macros. It must be followed by a macro name,
and then, optionally, by any number of arguments. The macro name can be any
sequence of non-whitespace characters. The arguments in the definition provide
default values. The following lines, up to <emphasis role="bold">.endmacro</emphasis>, form the body of the
macro. They are not processed in any way when the macro is defined; they are
processed only when the macro is called (see section <xref linkend="SECTcallingmacro"/>).
</para>
<para>
Within the body of a macro, argument substitutions can be specified by means of
a dollar character and an argument number, for example, <literal>$3</literal> for the third
argument. See also <emphasis role="bold">.eacharg</emphasis> above for the use of <literal>$+</literal> to refer to
relative arguments when looping through them. A reference to an argument that
is not supplied, and is not given a default, results in an empty substitution.
</para>
<para>
There is also a facility for a conditional substitution. A reference to an
argument of the form:
</para>
<literallayout>
<literal>$=</literal><emphasis>&lt;digits&gt;&lt;delimiter&gt;&lt;text&gt;&lt;delimiter&gt;</emphasis>
</literallayout>
<para>
inserts the text if the argument is defined and is not an empty string, and
nothing otherwise. The text is itself scanned for flags and argument
substitutions. The delimiter must be a single character that does not appear in
the text. For example:
</para>
<literallayout class="monospaced">
&amp;&lt;chapter$=2+ id="$2"+&gt;&amp;
</literallayout>
<para>
If this appears in a macro that is called with only one argument, the result
is:
</para>
<literallayout class="monospaced">
&lt;chapter&gt;
</literallayout>
<para>
but if the second argument is, say <literal>abcd</literal>, the result is:
</para>
<literallayout class="monospaced">
&lt;chapter id="abcd"&gt;
</literallayout>
<para>
This conditional feature can be used with both absolute and relative argument
references.
</para>
<para>
If a dollar character is required as data within the body of a macro, it must 
be doubled. For example:
</para>
<literallayout class="monospaced">
  .macro price
  The price is $$1.
  .endmacro
</literallayout>
<para>
If you redefine an existing macro, the new definition overrides the old. There
is no way to revert to the previous definition. If you define a macro whose 
name is the same as the name of a built-in directive you will not be able to 
call it, because <emphasis>xfpt</emphasis> looks for built-in directives before it looks for macros.
</para>
<para>
It is possible to define a macro within a macro, though clearly care must be 
taken with argument references to ensure that substitutions happen at the right
level.
</para>
</section>
<section>
<title>The <emphasis role="bold">.pop</emphasis> directive</title>
<para>
<emphasis>xfpt</emphasis> keeps a stack of text strings that are manipulated by the <emphasis role="bold">.push</emphasis> and
<emphasis role="bold">.pop</emphasis> directives. When the end of the input is reached, any strings that
remain on the stack are popped off, processed for flags, and written to the
output.
</para>
<para>
Each string on the stack may, optionally, be associated with an upper case
letter. If <emphasis role="bold">.pop</emphasis> is followed by an upper case letter, it searches down the
stack for a string with the same letter. If it cannot find one, it does
nothing. Otherwise, it pops off, processes, and writes out all the strings down
to and including the one that matches.
</para>
<para>
If <emphasis role="bold">.pop</emphasis> is given without a following letter, it pops one string off the
stack and writes it out. If there is nothing on the stack, an error occurs.
</para>
</section>
<section>
<title>The <emphasis role="bold">.push</emphasis> directive</title>
<para>
This directive pushes a string onto the stack. If the rest of the command line
starts with an upper case letter followed by white space, that letter is
associated with the string that is pushed, which consists of the rest of the
line. For example, the <emphasis role="bold">.chapter</emphasis> macro contains this line:
</para>
<literallayout class="monospaced">
 .push C &amp;&lt;/chapter&gt;&amp;
</literallayout>
<para>
Earlier in the macro there is the line:
</para>
<literallayout class="monospaced">
 .pop C
</literallayout>
<para>
This arrangement ensures that any previous chapter is terminated before
starting a new one, and also when the end of the input is reached.
</para>
</section>
<section id="SECTrevision">
<title>The <emphasis role="bold">.revision</emphasis> directive</title>
<para>
This directive is provided to make it easy to set the <literal>revisionflag</literal>
attribute on XML elements in a given portion of the document. The DocBook
specification states that the <literal>revisionflag</literal> attribute is common to all
elements.
</para>
<para>
The <emphasis role="bold">.revision</emphasis> directive must be followed by one of the words <quote>changed</quote>,
<quote>added</quote>, <quote>deleted</quote>, or <quote>off</quote>. For any value other than <quote>off</quote>, it causes
the internal variable <emphasis>xfpt.rev</emphasis> to be set to <literal>revisionflag=</literal> followed by
the given argument. If the argument is <quote>off</quote>, the internal variable is
emptied.
</para>
<para>
The contents of <emphasis>xfpt.rev</emphasis> are included in every <literal>&lt;para&gt;</literal> element that <emphasis>xfpt</emphasis>
generates. In addition, a number of the standard macros contain references to
<emphasis>xfpt.rev</emphasis> in appropriate places. Thus, setting:
</para>
<literallayout class="monospaced">
 .revision changed
</literallayout>
<para>
should cause all subsequent text to be marked up with <literal>revisionflag</literal>
attributes, until
</para>
<literallayout class="monospaced">
 .revision off
</literallayout>
<para>
is encountered. Unfortunately, at the time of writing, not all DocBook
processing software pays attention to the <literal>revisionflag</literal> attribute. 
Furthermore, some software grumbles that it is <quote>unexpected</quote> on some elements, 
though it does still seem to process it correctly.
</para>
<para>
For handling the most common case (setting and unsetting <quote>changed</quote>), the
standard macros <emphasis role="bold">.new</emphasis> and <emphasis role="bold">.wen</emphasis> are provided (see section
<xref linkend="SECTrevmacs"/>).
</para>
</section>
<section>
<title>The <emphasis role="bold">.set</emphasis> directive</title>
<para>
This directive must be followed by a name and a text string. It defines a user
variable and gives it a name. A reference to the name in the style of an XML
entity causes the string to be substituted, without further processing. For
example:
</para>
<literallayout class="monospaced">
 .set version 4.99
</literallayout>
<para>
This could be referenced as <literal>&amp;version;</literal>. If a variable is given the name of
an XML entity, you will not be able to refer to the XML entity, because local
variables take precedence. There is no way to delete a local variable after it
has been defined.
</para>
</section>
</chapter>

<chapter id="CHAPstdmac">
<title>The standard macros for DocBook</title>
<titleabbrev>Standard macros</titleabbrev>
<para>
A set of simple macros for commonly needed DocBook features is provided in
<emphasis>xfpt</emphasis>&#x2019;s library. This may be extended as experience with <emphasis>xfpt</emphasis> accumulates. The
standard macros assume that the standard flags are defined, so a document that
is going to use these features should start with:
</para>
<literallayout class="monospaced">
 .include stdflags
 .include stdmacs
</literallayout>
<para>
All the standard macros except <emphasis role="bold">new</emphasis>, <emphasis role="bold">index</emphasis>, and <emphasis role="bold">url</emphasis> are intended to
be called as directive lines. Their names are therefore shown with a leading
dot in the discussion below.
</para>
<section>
<title>Overall setup</title>
<para>
There are two macros that should be used only once, at the start of the
document. The <emphasis role="bold">.docbook</emphasis> macro has no arguments. It inserts into the output
file the standard header material for a DocBook XML file, which is:
</para>
<literallayout class="monospaced">
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"&gt;
</literallayout>
<para>
The <emphasis role="bold">.book</emphasis> macro has no arguments. It generates <literal>&lt;book&gt;</literal> and pushes
<literal>&lt;/book&gt;</literal> onto the stack so that it will be output at the end.
</para>
</section>
<section>
<title>Chapters, sections, and subsections</title>
<para>
Chapters, sections, and subsections are supported by three macros that all
operate in the same way. They are <emphasis role="bold">.chapter</emphasis>, <emphasis role="bold">.section</emphasis>, and
<emphasis role="bold">.subsection</emphasis>. They take either one, two, or three arguments. The first
argument is the title. If a second argument is present, and is not an empty
string, it is set as an ID, and can be used in cross-references. For example:
</para>
<literallayout class="monospaced">
 .chapter "Introduction"
</literallayout>
<para>
sets no ID, but
</para>
<literallayout class="monospaced">
 .section "A section title" "SECTdemo"
</literallayout>
<para>
can be referenced from elsewhere in the document by a phrase such as:
</para>
<literallayout class="monospaced">
 see section &amp;&lt;&lt;SECTdemo&gt;&gt;&amp;
</literallayout>
<para>
When the title of a chapter of section is being used as a running head or foot 
(for example), it may be too long to fit comfortably into the available space. 
DocBook provides the facility for a title abbreviation to be specified to deal
with this problem. If a third argument is given to one of these macros, it
causes a <literal>&lt;titleabbrev&gt;</literal> element to be generated. In this case, a second
argument must also be provided, but if you do not need an ID, the second
argument can be an empty string. For example:
</para>
<literallayout class="monospaced">
  .chapter "This chapter has quite a long title" "" "Long title"
</literallayout>
<para>
Where and when the abbreviation is used in place of the full title is
controlled by the stylesheet when the XML is processed.
</para>
<para>
These three macros use the stack to ensure that each chapter, section, and
subsection is terminated at the correct point. For example, starting a new
section automatically terminates an open subsection and a previous section.
</para>
</section>
<section>
<title>URL references</title>
<para>
The <emphasis role="bold">url</emphasis> macro generates URL references, and is intended to be called inline
within the text that is being processed. It generates a <literal>&lt;ulink&gt;</literal> element,
and has either one or two arguments. The first argument is the URL, and the
second is the text that describes it. For example:
</para>
<literallayout class="monospaced">
 More details are &amp;url(http://x.example, here).
</literallayout>
<para>
This generates the following XML:
</para>
<literallayout class="monospaced">
 More details are &lt;ulink url="http://x.example"&gt;here&lt;/ulink&gt;.
</literallayout>
<para>
If the second argument is absent, the contents of the first argument are used
instead. If <emphasis role="bold">url</emphasis> is called as a directive, there will be a newline in the 
output after <literal>&lt;/ulink&gt;</literal>, which in most cases (such as the example above), you 
do not want.
</para>
</section>
<section>
<title>Itemized lists</title>
<para>
The <emphasis role="bold">.ilist</emphasis> macro marks the start of an itemized list, the items of which
are normally rendered with bullets or similar markings. The macro can
optionally be called with one argument, for which there is no default. If the
argument is present, it is used to add a <literal>mark=</literal> attribute to the
<literal>&lt;itemizedlist&gt;</literal> element that is generated. The mark names that can be used
depend on the software that processes the resulting XML. For HTML output,
<quote>square</quote> and <quote>opencircle</quote> work in some browsers.
</para>
<para>
The text for the first item follows the macro call. The start of the next item
is indicated by the <emphasis role="bold">.next</emphasis> macro, and the end of the list by <emphasis role="bold">.endlist</emphasis>.
For example:
</para>
<literallayout class="monospaced">
 .ilist
 This is the first item.
 .next
 This is the next item.
 .endlist
</literallayout>
<para>
There may be more than one paragraph in an item. 
</para>
</section>
<section>
<title>Ordered lists</title>
<para>
The <emphasis role="bold">.olist</emphasis> macro marks the start of an ordered list, the items of which are
numbered. If no argument is given, arabic numerals are used. One of the
following words can be given as the macro&#x2019;s argument to specify the numeration:
</para>
<literallayout>
<literal>arabic     </literal>   arabic numerals
<literal>loweralpha </literal>   lower case letters
<literal>lowerroman </literal>   lower case roman numerals
<literal>upperalpha </literal>   upper case letters
<literal>upperroman </literal>   upper case roman numerals
</literallayout>
<para>
The text for the first item follows the macro call. The start of the next item
is indicated by the <emphasis role="bold">.next</emphasis> macro, and the end of the list by <emphasis role="bold">.endlist</emphasis>.
For example:
</para>
<literallayout class="monospaced">
 .olist lowerroman
 This is the first item.
 .next
 This is the next item.
 .endlist
</literallayout>
<para>
There may be more than one paragraph in an item.
</para>
</section>
<section>
<title>Variable lists</title>
<para>
A variable list is one in which each entry is composed of a set of one or more
terms and an associated description. Typically, the terms are printed in a
style that makes them stand out, and the description is indented underneath.
The start of a variable list is indicated by the <emphasis role="bold">.vlist</emphasis> macro, which has
one optional argument. If present, it defines a title for the list.
</para>
<para>
Each entry is defined by a <emphasis role="bold">.vitem</emphasis> macro, whose arguments are the terms.
This is followed by the body of the entry. The list is terminated by the
<emphasis role="bold">.endlist</emphasis> macro. For example:
</para>
<literallayout class="monospaced">
 .vlist "Font filename extensions"
 .vitem "TTF"
 TrueType fonts.
 .vitem "PFA" "PFB"
 PostScript fonts.
 .endlist
</literallayout>
<para>
As for the other lists, there may be more than one paragraph in an item.
</para>
</section>
<section>
<title>Nested lists</title>
<para>
Lists may be nested as required. Some DocBook processors automatically choose
different bullets for nested itemized lists, but others do not. The
<emphasis role="bold">.endlist</emphasis> macro has no useful arguments. Any text that follows it is
treated as a comment. This can provide an annotation facility that may make the
input easier to understand when lists are nested.
</para>
</section>
<section>
<title>Displayed text</title>
<para>
In displayed text each non-directive input line generates one output line. The
<literal>&lt;literallayout&gt;</literal> DocBook element is used to achieve this. Two kinds of
displayed text are supported by the standard macros. They differ in their
handling of the text itself.
</para>
<para>
The macro <emphasis role="bold">.display</emphasis> is followed by lines that are processed in the same way
as normal paragraphs: flags are interpreted, and so there may be font changes
and so on. The lines are processed in literal layout mode. For example:
</para>
<literallayout class="monospaced">
 .display
 &amp;`-o`&amp;   set output destination
 &amp;`-S`&amp;   set library path
 .endd
</literallayout>
<para>
The output is as follows:
</para>
<literallayout>
 <literal>-o</literal>   set output destination
 <literal>-S</literal>   set library path
</literallayout>
<para>
The macro <emphasis role="bold">.code</emphasis> is followed lines that are not processed in any way, except
to turn ampersands and angle brackets into XML entities. The lines are
processed in literal text mode. In addition, <literal>class="monospaced"</literal> is added to
the <literal>&lt;literallayout&gt;</literal> element, so that the lines are displayed in a
monospaced font. For example:
</para>
<literallayout class="monospaced">
 .code
 z = sqrt(x*x + y*y);
 .endd
</literallayout>
<para>
As the examples illustrate, both kinds of display are terminated by the
<emphasis role="bold">.endd</emphasis> macro.
</para>
</section>
<section>
<title>Block quotes</title>
<para>
The macro pair <emphasis role="bold">.blockquote</emphasis> and <emphasis role="bold">.endblockquote</emphasis> are used to wrap the
lines between them in a <literal>&lt;blockquote&gt;</literal> element.
</para>
</section>
<section id="SECTrevmacs">
<title>Revision markings</title>
<para>
Two macros are provided to simplify setting and unsetting the <quote>changed</quote>
revision marking (see section <xref linkend="SECTrevision"/>). When the revised text is
substantial (for example, a complete paragraph, table, display, or section), it
can be placed between <emphasis role="bold">.new</emphasis> and <emphasis role="bold">.wen</emphasis>, as in this example:
</para>
<literallayout class="monospaced">
  This paragraph is not flagged as changed.
  .new
  This is a changed paragraph that contains a display:
  .display
  whatever
  .endd
  This is the next paragraph.
  .wen      
  Here is the next, unmarked, paragraph.
</literallayout>
<para>
When called like this, without an argument, <emphasis role="bold">.new</emphasis> terminates the current
paragraph, and <emphasis role="bold">.wen</emphasis> always does so. Therefore, even though there are no
blank lines before <emphasis role="bold">.new</emphasis> or <emphasis role="bold">.wen</emphasis> above, the revised text will end up in
a paragraph of its own. (You can, of course, put in blank lines if you wish.)
</para>
<para>
If want to indicate that just a few words inside a paragraph are revised, you
can call the <emphasis role="bold">new</emphasis> macro with an argument. The macro can be called either as
a directive or inline:
</para>
<literallayout class="monospaced">
  This is a paragraph that has 
  .new "a few marked words" 
  within it. Here are &amp;new(some more) marked words.
</literallayout>
<para>
The effect of this is to generate a <literal>&lt;phrase&gt;</literal> XML element with the
<literal>revisionflag</literal> attribute set. The <emphasis role="bold">.wen</emphasis> macro is not used in this case.
</para>
</section>
<section>
<title>Informal tables</title>
<para>
The <emphasis role="bold">.itable</emphasis> macro starts an informal (untitled) table with some basic
parameterization. If you are working on a large document that has many tables
with the same parameters, the best approach is to define your own table macros,
possibly calling the standard one with specific arguments.
</para>
<para>
The <emphasis role="bold">.itable</emphasis> macro has four basic arguments:
</para>
<orderedlist numeration="arabic">
<listitem>
<para>
The frame requirement for the table, which may be one of the words <quote>all</quote>,
<quote>bottom</quote>, <quote>none</quote> (the default), <quote>sides</quote>, <quote>top</quote>, or <quote>topbot</quote>.
</para>
</listitem>
<listitem>
<para>
The <quote>colsep</quote> value for the table. The default is <quote>0</quote>, meaning no vertical
separator lines between columns. The value <quote>1</quote> requests vertical separator
lines.
</para>
</listitem>
<listitem>
<para>
The <quote>rowsep</quote> value for the table. The default is <quote>0</quote>, meaning no horizontal
lines between rows. The value <quote>1</quote> requests horizontal separator lines.
</para>
</listitem>
<listitem>
<para>
The number of columns.
</para>
</listitem>
</orderedlist>
<para>
These arguments must be followed by two arguments for each column. The first
specifies the column width, and the second its aligmnent. A column width can be
specified as an absolute dimension such as 36pt or 2in, or as a proportional
measure, which has the form of a number followed by an asterisk. The two forms
can be mixed &ndash; see the DocBook specification for details.
</para>
<para>
Straightforward column alignments can be specified as <quote>center</quote>, <quote>left</quote>, or
<quote>right</quote>. DocBook also has some other possibilities, but sadly they do not 
seem to include <quote>centre</quote>.
</para>
<para>
Each row of the table is specified using a <emphasis role="bold">.row</emphasis> macro; the entries in
the row are the macros&#x2019;s arguments. The table is terminated by <emphasis role="bold">.endtable</emphasis>,
which has no arguments. For example:
</para>
<literallayout class="monospaced">
 .itable all 1 1 2 1in left 2in center
 .row "cell 11" "cell 12"
 .row "cell 21" "cell 22"
 .endtable
</literallayout>
<para>
This specifies a framed table, with both column and row separator lines. There
are two columns: the first is one inch wide and left aligned, and the second is
two inches wide and centred. There are two rows. The output looks like this:
</para>
<informaltable frame="all">
<tgroup cols="2" colsep="1" rowsep="1">
<colspec colwidth="1in" align="left"/>
<colspec colwidth="2in" align="center"/>
<tbody>
<row>
<entry>cell 11</entry>
<entry>cell 12</entry>
</row>
<row>
<entry>cell 21</entry>
<entry>cell 22</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
The <emphasis role="bold">.row</emphasis> macro does not set the <literal>revisionflag</literal> attribute in the 
<literal>&lt;entry&gt;</literal> elements that it generates because this appears to be ignored by 
all current XML processors. However, you can use an inline call of the <emphasis role="bold">new</emphasis> 
macro within an entry to generate a <literal>&lt;phrase&gt;</literal> element with <literal>revisionflag</literal> 
set.
</para>
</section>
<section>
<title>Indexes</title>
<para>
The <emphasis role="bold">.index</emphasis> macro generates <literal>&lt;indexterm&gt;</literal> elements (index entries) in the
output. It takes one or two arguments. The first is the text for the primary
index term, and the second, if present, specifies a secondary index term. This
macro can be called either from a directive line, or inline. However, it is
mostly called as a directive, at the start of a relevant paragraph. For
example:
</para>
<literallayout class="monospaced">
 .index goose "wild chase"
 The chasing of wild geese...
</literallayout>
<para>
You can generate <quote>see</quote> and <quote>see also</quote> index entries by using <emphasis role="bold">.index-see</emphasis> 
and <emphasis role="bold">.index-seealso</emphasis> instead of <emphasis role="bold">.index</emphasis>.
</para>
<para>
If you want to generate an index entry for a range of pages, you can use the 
<emphasis role="bold">.index-from</emphasis> and <emphasis role="bold">.index-to</emphasis> macros. The first argument of each of them is 
an ID that ties them together. The second and third arguments of 
<emphasis role="bold">.index-from</emphasis> are the primary and secondary index items. For example:
</para>
<literallayout class="monospaced">
 .index-from "ID5" "indexes" "handling ranges"
 ... &lt;lines of text&gt; ...
 .index-to "ID5"
</literallayout>
<para>
The <emphasis role="bold">.makeindex</emphasis> macro should be called at the end of the document, at the
point where you want an index to be generated. It can have up to two
arguments. The first is the title for the index, for which the default is
<quote>Index</quote>. The second, if present, causes a <literal>role=</literal> attribute to be added to
the <literal>&lt;index&gt;</literal> element that is generated. For this to be useful, you need to
generate <literal>&lt;indexterm&gt;</literal> elements that have similar <literal>role=</literal> attributes. The
standard <emphasis role="bold">index</emphasis> macro cannot do this. If you want to generate multiple
indexes using this mechanism, it is best to define your own macros for each
index type. For example:
</para>
<literallayout class="monospaced">
 .macro cindex
 &amp;&lt;indexterm role="concept"&gt;&amp;
 &amp;&lt;primary&gt;&amp;$1&amp;&lt;/primary&gt;&amp;
 .arg 2
 &amp;&lt;secondary&gt;&amp;$2&amp;&lt;/secondary&gt;&amp;
 .endarg
 &amp;&lt;/indexterm&gt;&amp;
 .endmacro
</literallayout>
<para>
This defines a <emphasis role="bold">.cindex</emphasis> macro for the <quote>concept</quote> index. At the end of the 
document you might have:
</para>
<literallayout class="monospaced">
 .makeindex "Concept index" "concept"
 .makeindex
</literallayout>
<para>
As long as the processing software can handle multiple indexes, this causes two
indexes to be generated. The first is entitled <quote>Concept index</quote>, and contains 
only those index entries that were generated by the <emphasis role="bold">.cindex</emphasis> macro. The 
second contains all index entries.
</para>
<literallayout class="monospaced">
This block is a test for a blank line

which should have appeared there.
</literallayout>
<literallayout>
This block is a test for a blank line

which should have appeared there.
</literallayout>
</section>
</chapter>

</book>
