Introduction

The tables were created for two purposes: To this end, anyone not familiar with this topic is urged to read the following note with care, and preferably also to consult the associated briefing/tutorial.

The formal statement of what a code point is supposed to represent is the description contained in column 5. If you have a standards-compliant browser configuration, it will also be displaying the actual characters in columns 6, 7 and (we hope) 8, but in the event of any discrepancies, it's column 5 that you should believe. The table only covers the code points from 160(decimal) upwards, because

It cannot be repeated too often that the many people who have displayed all 256 possible code points on their displays, and have assumed that what they see there is a definition of ISO8859-1, are very seriously misguided. Their web pages, no matter how well-intentioned, are highly confusing, and they only mislead others who have not yet understood the problem.

Authors are also recommended to refer to my report on browsers, which shows the extent to which some representative browsers support the mechanisms described here, and offers advice to authors on how to code their HTML for best results.

Entity name variants and test cases

This section includes a number of variants of entity names known to me; some were introduced in HTML+ or HTML3.0 (drafts that are now expired), some from the ISO* entity names files for SGML, and some from proposals more or less relevant to HTML such as Hyper-G. It is perhaps worth noting that the text of the HTML3.0 draft described "emdash" and "endash", but the associated DTD contained "mdash" and "ndash" - the HTML3.0 draft was never really completed, and the discrepancy was never corrected. Updating this writeup at Jan 1997, I would say look to the Cougar proposal for the current wisdom on these topics.

The character code test table, linked to the next section of the present document, contains only one of these entity names per character, and conforms to the "Proposed Entities" list in the HTML2.0 Specification (RFC1866), and the "HTML 3.2 Reference Specification" (W3C Recommendation 14-Jan-1997), which in turn are consistent with the larger lists in the ISOnum and ISOdia entity sets used in SGML. In the ISO entity sets, uml and die are alternative names for the same glyph, intended to be used according to context: the HTML specifications only use the uml variant, and by now the current browsers do support that, but some also support die, and some older browsers only supported die. (Look them up in a good dictionary if you want to know the difference - "dieresis" if it's a US dictionary.) As neither of them is of much use in HTML, this is usually nothing to worrry about, but anyway, (except for some pre-HTML2.0 browsers, which can now be ignored), all browsers support the numerical character references, which neatly side-steps the problem.

The ­, and its equivalents ­ and the 8-bit character, should be treated specially, as explained in the main briefing and in RFC2070. Its rendering in the test tables is not significant, as those tables are not using it in the proper fashion. There should be a specific test to ensure that the browser is handling it correctly (i.e suppressing it if it is not contiguous with a linebreak, and rendering a hyphen if it is).

     Description                 entity name     test case
                               or numerical ref.
     -----------                 -----------     ---------
Umlaut mark or diaeresis            uml            ¨
                                    die            ¨
macron (overbar)                    macron         ¯on;
                                    macr           ¯
                                    hibar          &hibar;
degree                              degree         °ree;
                                    deg            °
cedilla                             Cedilla        ¸
                                    cedil          ¸

The following are not part of the ISO-8859-1 repertoire:
(numerical character references larger than 255 are from Unicode)

trade mark (TM)                     trade          ™
  ditto as numerical:               #8482          ™
endash (old version)                endash         &endash;
endash (current version)            ndash          –
  ditto numerical:                  #8211          –
emdash (old version)                emdash         &emdash;
emdash (current version)            mdash          —
  ditto numerical:                  #8212          —

aleph symbol (from HTML4.0)         alefsym        ℵ
  ditto numerical:                  #8501          ℵ
(in HTML+, this was:                aleph          ℵ )

"Non-white" space (shown in brackets for clarity):
                                    ensp           [ ]
                                    #8194          [ ]
                                    emsp           [ ]
                                    #8195          [ ]

some folks seem to think these are  enspace        [&enspace;]
                                    emspace        [&emspace;]

The Table

The columns of the table are as follows.
1,2,3: Code value, in hex, decimal and octal respectively
4: Entity name - note that HTML2 browsers generally honour only a subset of the entities from the list; browsers that conform with HTML3.2 (Wilbur) must support all the ones in the table, though they might or might not support variants such as those noted above.
5: Description of the associated character
6: The character itself, sent as an 8-bit character from the server
7: The &#number; representation of the code point
8: The &entity; representation as in col.4. If it is supported by your browser, this will produce the desired effect; if not, it seems most commonly to display as the &entity; sequence itself.
9: Comments
*M marks characters that are typically displayed wrongly by Macintosh-based browsers.

If the browser is behaving as desired, then columns 6, 7 and (where the browser supports it) column 8 should all be displaying the glyph appropriate to the description in column 5.

If your browser supports at least the basic elements of the HTML3 <TABLE> construct, you can view the TABLE format test document; any browser should be able to view the pre-formatted test document.

Technical note: the tables were created by executing a REXX script.

Addendum: non-break space test

This section tests your browser for its behaviour on one or several non-break spaces. To avoid complications, no kind of indenting or formatting is attempted here; just the plain test materials, left aligned, and nothing more.

First, ordinary (i.e non pre-formatted) text

For comparison, lines that use a single ordinary space are also shown:
|| no space at all
| | a single ordinary space
| | a single nbsp
|   | three nbsp
| | numerical ref. &#160;
|   | three of those
| | a single ordinary space again

Now the same thing inside PRE-formatted text

The layout is the same as before.
|| no space at all
| | a single ordinary space
| | a single nbsp
|   | three nbsp
| | numerical ref. &#160;
|   | three of those
| | a single ordinary space again

[Prev][Up]