HTML: The Definitive Guide

Previous Chapter 4 Next
 

Text Basics

Contents:
Divisions and Paragraphs
Headings
Changing Text Appearance
Content-based Style Tags
Physical Style Tags
Expanded Font Handling
Precise Spacing and Layout
Block Quotes
Addresses
Special Character Encoding

In this day and age of hoopla and hype, how has become almost as, and in some cases more important than, what. Any successful presentation, even a thoughtful tome, should have its text organized into an attractive, effective document. Organizing text into attractive documents is HTML's forte. The language gives you a number of tools that help you mold your text and get your message across. HTML also helps structure your document so your target audience has easy access to your words.

Always keep in mind while designing your documents, though (here we go again!), that HTML tags, particularly in regard to text, only advise--they do not dictate--how a browser will ultimately render the document. Rendering varies from browser to browser. Don't get too entangled with trying to get just the right look and layout. Your attempts may and probably will be thwarted by the browser.

4.1 Divisions and Paragraphs

Like most text processors, a browser wraps the words it finds in the HTML text to fit the horizontal width of its viewing window. Widen the browser's window and words automatically flow up to fill the wider lines. Squeeze the window and words wrap downwards.

Unlike most text processors, however, HTML uses explicit division (<div>), paragraph (<p>), and line-break (<br>) tags to control the alignment and flow of text. Return characters, although quite useful for readability of the source HTML document, are ignored by the browser.

The <div> Tag

The <div> tag is actually a part of the developing HTML 3.2 standard and is not supported by any but the Netscape browser.

As proposed in the HTML 3.2 standard, the <div> tag specification divides a document into separate divisions, and has a variety of formatting options that let you define a unique style for each division. You might have one for your document's abstract, another for the body, a third for the conclusion, and a fourth for the bibliography, for instance. Each division has a different default format: the abstract indented and in an italic face; the body in a left-justified Roman face; the conclusion similar to the abstract; and the bibliography automatically numbered and formatted appropriately.

Sadly, the only feature of <div> retained by the folks at Netscape is its ability to alter the alignment of all text within the division. Thus, in its current incarnation, Netscape's <div> serves just a slightly more general purpose than the <center> tag, which serves to center large blocks of body content.

The align attribute

The align attribute for <div> justifies the enclosed content to either the left (default), center, or right of the browser display. The <div> tag may be nested, and the alignment of the nested <div> tag takes precedence over the containing <div> tag. Further, other nested alignment tags, such as <center>, aligned paragraphs (see <p> below), or specially aligned table rows and cells, override the effect of <div>.

The <p> Tag

The <p> tag signals the start of a paragraph. That's not well known even by some veteran webmasters because it runs counterintuitive to what we've come to expect from experience. Most word processors we're familiar with use just one special character, typically the return character, to signal the end of a paragraph.

In the recommended HTML way, each paragraph starts with <p> and ends with the corresponding </p> tag. And while a sequence of newline characters in a text processor-displayed document introduces a blank space between each one, HTML browsers typically ignore all but the first paragraph tag.

In practice, you also can ignore the starting <p> tag at the beginning of the first paragraph, as well as the </p> tag at the end of all paragraphs, since they can be implied from other tags that occur in the document and hence safely omitted. For example:

<body>
This is the first paragraph, at the very beginning of the 
body of this document.
<p>
The tag above signals the start of this second paragraph. 
When rendered by a browser, it will begin slightly below the 
end of the first paragraph, with a bit of extra whitespace 
between the two paragraphs.
<p>
This is the last paragraph in the example.
</body>

Notice that we haven't included the paragraph start tag (<p>) for the first paragraph or any end paragraph tags at all in the example; they can be unambiguously inferred by the browser and are therefore unnecessary.

In general, you'll find that human document authors tend to omit postulated tags whenever possible, while automatic document generators tend to insert them. That may be because the software designers didn't want to run the risk of having their product chided by competitors as not adhering to the HTML standard, even though we're splitting letter-of-the-law hairs here. Go ahead and be defiant: Omit that first paragraph's <p> tag and don't give a second thought to paragraph ending </p> tags, provided, of course, that your document's structure and clarity are not compromised.

Paragraph rendering

When encountering the new paragraph (<p>) tag, a browser typically inserts one character-high line plus some extra vertical space into the document before starting the new paragraph. The browser then collects all the words and, if present, inline images into the new paragraph, ignoring leading and trailing spaces (not spaces between words, of course) and return characters in the HTML text. The browser software then flows the resulting sequence of words and images into a paragraph that fits within the margins of its display window, automatically generating line breaks as needed to wrap the text within the window. For example, compare how a browser arranges the text into lines and paragraphs (Figure 4-1) to how the preceding HTML example is printed on the page. The browser may also automatically hyphenate long words, and the paragraph may be fill-justified to stretch the line of words out towards both margins.

The net result is that you do not have to worry about line length, word wrap, and line breaks when composing your HTML documents. The browser will take any arbitrary sequence of words and images and display a nicely formatted paragraph.

If you want to control line length and breaks explicitly, consider using a preformatted text block with the <pre> tag. If you need to force a line break, use the <br> tag. [<pre>, 4.7.5] [<br>, 4.7.1]

The align attribute

The next version of HTML may standardize a way to control how the browser justifies the contents of a paragraph. The latest versions of the popular browsers already implement one way: the special align extension.

Justified to the left side of the display window is the default paragraph alignment for most browsers. Left-justified content is also what all browsers revert to when encountering a new <p> tag. Currently, Internet Explorer and some earlier Netscape versions (1.x) let you specially center-justify a paragraph with the align attribute and value of center. Netscape 2.0 supports a longer list, letting you set the paragraph to one of three possible values: left, right, or center.

Figure 4-2 shows you the effect of each alignment, as rendered from the following source:

<p align=right>
Right over here!
<br>
This is too. 
<p align=left>
Slide back left. 
<p align=center>
Smack in the middle. 
</p>
Left's the default.

Notice in the example that the paragraph alignment remains in effect until the browser encounters another <p> tag or an ending </p> tag. Other body elements may also disrupt the current paragraph alignment and cause subsequent paragraphs to revert to the default left alignment, including forms, headers, tables, and most other body content-related tags.

Allowed paragraph content

An HTML paragraph may contain any element allowed in a text flow, including conventional words and punctuation, links (<a>), images (<img>), line breaks (<br>), font changes (<b>, <i>, <tt>, <u>, <strike>, and <font>), and content-based style changes (<cite>, <code>, <em>, <kbd>, <samp>, <strong>, and <var>). If any other element occurs within the paragraph, it implies the paragraph has ended, and the browser assumes the closing </p> tag was not specified.

Allowed paragraph usage

You may specify a paragraph only within a block, along with other paragraphs, lists, forms, and preformatted text. In general, this means that paragraphs can appear where a flow of text is appropriate, such as in the body of a document, an element in a list, and so on. Technically, paragraphs cannot appear within a header, anchor, or other element whose content is strictly text-only. In practice, most browsers ignore this restriction and format the paragraph as a part of the containing element.


Previous Home Next
The <body> Tag Book Index Headings