Q: HOW DO I GET A SO-AND-SO CHARACTER IN MY HTML? (update: Dec1997)
A: The safest way to do HTML is in (7-bit) US ASCII, and expressing
other characters by using HTML entities (&entityname;) or numerical
character references (number;) .
Working with 8-bit characters can also be successful in many practical
situations: unix/X and MS-Windows (using Latin-1), and also Macs (with
some reservations).
The available characters, up to and including HTML3.2, are those in
ISO-8859-1, listed in the HTML2.0 specification, in the HTML3.2
recommendation, and at http://www.htmlhelp.com/reference/charset/
A failure to render any of these characters would be a serious fault
in any current browser.
When authoring on platforms whose own character code isn't ISO-8859-1,
such as MS DOS, Macs, there may be problems: you'd need to use text
transfer methods that convert between the platform's own code and
ISO-8859-1 (e.g Fetch for the Mac), or convert separately (e.g GNU
recode). Using 7-bit ASCII with entities avoids those problems, and
this FAQ is too small to cover other possibilities in detail. Mac
users - see the notes at the above URL.
If you run a Web server (httpd) on a platform whose own character code
isn't ISO-8859-1, such as a Mac, or IBM mainframe, it's the job of the
server to convert text documents into ISO-8859-1 code when sending them
to the network.
Some browsers have had coverage for other character repertoires and
encodings for some time, but it is only recently, since RFC2070 codified
it, that some uniformity of coverage has appeared. It is possible
to use these extensions to some extent on the WWW now, but the details
go beyond what can be covered in this FAQ: look for more-detailed
discussions elsewhere.
Some communities have been working together for a long time in codes
other than iso-8859-1, but their documents might not be accessible
(or might only be accessible by taking special precautions, such as
manually changing the browser's document encoding setting) to other
WWW readers.
---
Q: which should I use, the &entityname; or the number; ?
A: Browsers complying to HTML3.2 must support both, but some older
browsers have only limited coverage of the names.
1. for the Latin-1 accented letters, also lt, gt, amp, and quot when
needed, use the &entityname; form. They are easily remembered, and
browser coverage is excellent.
2. copy, reg and nbsp are now well-covered by current browsers.
Using #number instead may still be helpful for some older browsers.
3. for the remainder, browsers were rather late in supporting them all.
Browsers complying to HTML3.2 will honor their entity names, but there
may still be browser versions in use that don't, so it might still be
wise to use the number; references for these characters.
There is no need to use " or " instead of a quotation mark
except where it would have significance to HTML, i.e in attribute
values; use of those notations in other places is not wrong, but it
pointlessly inflates the size of HTML documents.
---