Unix Power ToolsUnix Power ToolsSearch this book

45.13. Formatting Markup Languages -- troff, LATEX, HTML, and So On

Section 45.12 shows an example of a simple formatting markup language; the one used by man via nroff. Don't laugh -- it may seem arcane, but it is fairly simple. Like all markup languages, it attempts to abstract out certain things, to allow you to describe what you'd like the end result to look like. Manpages are simple to describe, so the markup language for them is relatively simple.

Full troff is somewhat more complex, both because it allows expressing far more complex ideas, and because it allows definition of macros to extend the core markup language. Similarly, TEX (pronounced "tek") is essentially a programming language for typesetting. It provides a very thorough model of typesetting and the ability to, essentially, write programs that generate the output you want.

Available on top of TEX is LATEX (pronounced "lah-tek" or "lay-tek"), a complex macro package focused on general document writing. It allows you to describe the general structure of your document and let LATEX (and underneath, TEX) sort out the "proper" way to typeset that structure. This sort of markup is very different to deal with than working in a WYSIWYG word processor, where you have to do all of the formatting yourself. As an example, a simple LATEX document looks something like this (taken from The Not So Short Introduction to LATEX2e):

\documentclass[a4paper,11pt]{article}
% define the title
\author{H.~Partl}
\title{Minimalism}
\begin{document}
% generates the title
\maketitle
% insert the table of contents
\tableofcontents
\section{Start}
Well, and here begins my lovely article.
\section{End}
\ldots{} and here it ends.
\end{document}

Much like the nroff input earlier, this describes the structure of the document by inserting commands into the text at appropriate places. The LyX editor (http://www.lyx.org) provides what they call What You See Is What You Mean (WYSIWYM, or whiz-ee-whim) editing by sitting on top of LATEX. Lots of information about TEX and LATEX is available at the TEX Users' Group web site, http://www.tug.org. TEX software is available via the Comprehensive TEX Archive Network, or CTAN, at http://www.ctan.org. I strongly recommend the teTEX distribution as a simple way to get a complete installation of everything you need in one swell foop.

In contrast, while HTML is also a markup language, its markup is focused primarily on display and hypertext references rather than internal document structure. HTML is an application of SGML; you probably know about it already because it is the primary display markup language used on the web. The following is essentially the same as the sample LATEX document, but marked up using HTML formatting:

<html>
<head>
<title>Minimalism</title>
</head>
<body>
<h1>Minimalism</h1>
...table of contents...
<h2>Start</h2>
<p>Well, and here begins my lovely article.</p>
<h2>End</h2>
<p>&hellip; and here it ends.</p>
</body>
</html>

Other markup languages common on Unixes include DocBook, which is also an application of SGML or XML, and in which a lot of Linux documentation is written, and texinfo, the source language of info pages, in which most GNU documentation is written. The manuscript for this edition of Unix Power Tools is written in a variant of SGML-based DocBook, in fact.

-- DJPH



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.