TeXML is an XML syntax for TeX. A processor translates TeXML source into TeX.
The Document Type Definition (DTD) for TeXML can be found in a TeXML distribution package.
TeXMLcmdenvgroupmath and dmathctrlspecpdfTeXML
<?xml version="1.0" encoding="..."?> <TeXML> ... your content here ... </TeXML>
The root element of a TeXML document is the element TeXML.
cmd
TeXML: <cmd name="documentclass"> <opt>12pt</opt> <parm>letter</parm> </cmd>
TeX: \documentclass[12pt]{letter}
The TeXML cmd element encodes TeX commands.
opt children to the cmd element. The processor places opt children within square braces, as LaTeX style options.parm children to the cmd element. The processor places parm children within TeX groups, that is, curly braces.The TeXML cmd can have several parm or opt elements.
env
TeXML: <env name="document"> ... </env>
TeX: \begin{document} ... \end{document}
The element env is a convenience for expressing LaTeX environments.
group
TeXML: <group><cmd name="it"/>italics</group>
TeX: {\it italics}
The group element is a convenience for encoding groups. The processor will supply an opening brace at the beginning, and a closing brace at the end of the group.
math and dmath
TeXML: <math>a+b</math> <dmath><cmd name="sqrt"><parm>2</parm></cmd></dmath>
TeX: $a+b$ $$\sqrt{2}$$
Elements math and dmath are conveniences for encoding math groups. The processor inserts the appropriate math shift symbol at the beginning and end of the group and also switches mode to math inside the group.
ctrl
TeXML: line1<ctrl ch="\"/>line2
TeX: line1\\line2
The ch attibute of the ctrl element encodes a control symbol.
spec
TeXML: <spec cat="vert"/>l<spec cat="vert"/>
TeX: |l|
The attribute cat of the element spec creates the corresponding symbol verbatim, without escaping.
| description |
cat value |
output |
|---|---|---|
| escape character | esc | \ |
| begin group | bg | { |
| end group | eg | } |
| math shift | mshift | $ |
| alignment tab | align | & |
| parameter | parm | # |
| superscript | sup | ^ |
| subscript | sub | _ |
| tilde | tilde | ~ |
| comment | comment | % |
| vertical line | vert | | |
| less than | lt | < |
| greater than | gt | > |
pdf
TeXML: <pdf>τεχ</pdf>
TeX: \003\304\003\265\003\307
Content of the element pdf is converted to UTF16BE encoding and represented using escaped octal codes. The result is a PDF unicode string.
Characters are processed as follows:
To leave specials as is, without escaping, use the TeXML attribute escape:
<TeXML escape="0">...</TeXML>
| symbol | text mode | math mode |
|---|---|---|
| \ | \textbackslash{} | \backslash{} |
| { | \{ | \{ |
| } | \} | \} |
| $ | \textdollar{} | \$ |
| & | \& | \& |
| # | \# | \# |
| ^ | \^{} | \^{} |
| _ | \_ | \_ |
| ~ | \textasciitilde{} | \~{} |
| % | \% | \% |
| | | \textbar{} | | |
| < | \textless{} | < |
| > | \textgreater{} | > |
The LaTeX mapping table for unicode characters is automatically generated from the file unicode.xml. This file is an appendix for the W3C MathML specification.
If a replacement of an unicode character a) is valid only in math mode and b) the current mode is text, then the replacement is wrapped by the command “\ensuremath”. Likewise if a replacement a) is valid only in text mode and b) the current mode is math, then wrapper “\ensuretext” is used.
LaTeX does not have the command “\ensuretext” so you should define it yourself. One of the approaches is:
\def\ensuretext{\textrm}
Empty lines have a special meaning for TeX. They cause automatic generation of the TeX command \par. To avoid this, the processor outputs a line with the one symbol % (TeX comment) instead of a empty line.
To leave empty lines as is, use the TeXML attribute emptylines:
<TeXML emptylines="1">...</TeXML>
The TeXML processor disconnects well-known ligatures “--”, “---”, “``”, “''”, “!`” and “?`”. These ligatures are converted into “-{}-”, “-{}-{}-”, “`{}`”, “'{}'”, “!{}`”, and “?{}`” respectively.
To leave ligatures as is, use the TeXML attribute ligatures:
<TeXML ligatures="1">...</TeXML>
There are two modes: text and math. Modes only affect the translation of characters.
The default mode is text. In order to change mode, use the mode attribute of the element TeXML. The possible values for this attribute are math and text. If the element TeXML is used without attribute mode, then the mode is not changed.
<TeXML mode="math"> ... math mode here ... <TeXML mode="text">... text mode here ...</TeXML> </TeXML>
Elements math and dmath also change mode to math.
The TeXML processor performs advanced whitespace processing. The program
If you find that something goes wrong you can switch off whitespace elimination using the ws attribute of the TeXML tag.
<TeXML ws="1"> ... whitespace is verbatim here ... </TeXML>
If the TeXML elements ctrl or spec have any content (including whitespace) then the TeXML processor reports an error.
The program deletes any whitespace that is located directly in the TeXML element cmd.
Insignificant whitespace is whitespace around any opening or closing tag, for example, whitespace around “... <TeXML> ...” and “... </TeXML> ...”. The XML reader converts insignificant whitespace into the weak space.
Another source of weak spaces is TeX commands. When the processor converts “<cmd name="it"/>” into “\it ”, the space after “\it” is a weak space.
The TeX writer processes weak spaces in the following manner:
The resulting documents are usually very good, but after some tuning they can be even better. This section describes how whitespace is handled and introduces some hints to make resulting documents look as good as handcrafted.
If a command has no parameters and options then the TeXML processor adds an empty group “{}” after the command name: “\smth{}”. Without the empty group, the following whitespace is ignored by TeX, but sometimes it is exactly what you need. In this case set attribute “gr” (shortcut for “group”) to “0”.
TeXML: <cmd name="it"/> once, <cmd name="it" gr="0"/> twice
TeX: \it{} once, \it twice
It's difficult to work with documents that are one long line as a result of transformation, so the TeXML processor performs automatic line breaking.
By default “far enough” is 62. You can set another value by using command line parameter “-w” or “--width”. This setting is not strict: a line can be much longer than a specified width, if there are no spaces in it.
Attributes nl1 and nl2 can be used to force a new line before (nl1) or after (nl2) TeX command.
The TeXML processor automatically creates new lines around the beginning and the end of an environment. You can change this behaviour using four attributes: nl1 (before the beginning), nl2 (after the beginning), nl3 (before the end) and nl4 (after the end).
You can affect whitespace output by using special categories of the element spec: nl, nl?, space and nil.
TeXML namespace is http://getfo.sourceforge.net/texml/ns1.
<TeXML xmlns="http://getfo.sourceforge.net/texml/ns1"> ... </TeXML>
In the ConTeXt mode, the element env creates ConTeXt environments.
TeXML: <env name="document"> ... </env>
TeX: \begindocument ... \enddocument
To activate ConTeXt mode, give the command line option -c or --context to the TeXML processor.