charset Burp
The "Netscape burp" is only one aspect of a more general, and
potentially security-relevant, issue about character coding in
HTTP protocol transfers.
There's an article, i18n: HTTP - charset at the W3C web site urging authors to set a proper charset
attribute on their HTTP protocol headers
(i.e not merely in their "meta http-equiv").
According to CERT security alert CA-2000-02,
it is a potential security exposure to send out HTML and other
text-type documents without an explicit character coding
(charset=) specified.
In practice there are two ways of specifying the character coding
for HTML files sent out by HTTP protocol:
(1) the Content-type HTTP header, and (2) a
META HTTP-EQUIV element within the document.
(There are other issues in the case of XML/XHTML documents; this
is not the place to go into those details).
My recommendation would be specify the character coding on the HTTP
header whenever possible.
Many information providers consider that specifying it via META
is easier for them, and has some advantages when viewing files locally or
using FTP; however, there are quite a number of theoretical and practical
reasons for preferring the real HTTP header in a WWW context; not
all of those reasons are set out here.
Various versions of the Netscape 4.* browser have a tendency to "burp" when
an HTML document contains a META HTTP-EQUIV that specifies
a charset value for the document.
We point to another resource on the topic, and offer a possible
solution.
A fix was included in NS4.5PR2 and the subsequent release, but it seems there are some situations where the effect is still observed in that browser. But, as I say, there are more fundamental considerations of principle: even if/when the Netscape-4 problem is considered to be ancient history, there's still the advice of CA-2000-02 to take into account.
This page also deals with some other oddities of
META HTTP-EQUIV handling in Netscape version(s).
It's been noticed for quite a while now that some kinds of HTML document cause Netscape 4.* to "burp": it gives an impression of starting to display the document, and then briefly stops, and then starts over again. Investigating server statistics shows that it may re-load the whole document from the server. Sometimes on a form submission, NS even puts up a dialogue asking whether it should re-post the submission to the server (which is very disturbing if, in fact, the form was supposed to be placing an order, or doing something else that should not be arbitrarily repeated).
After some study and discussion, people concluded that all of the documents that were involved in this effect contained a
<META HTTP-EQUIV="Content-type"
CONTENT="text/html;charset=something">
However, the burp doesn't always occur, even with pages that
contain this item.
After some discussion on usenet, Sander Tekelenburg created
a web page to report his investigations of
the burp.
He concluded that the burp did not occur if the
document was already in the browser's cache.
Aside from that,
the burp was observed if there was anything
ahead of the META HTTP-EQUIV,
such as an HTML comment or, importantly,
an SGML DOCTYPE declaration.
So, the only way to be sure of avoiding the problem if
you have this kind of META HTTP-EQUIV
in the document is to put it right at the top.
Well, as Sander pointed out, it's technically mandatory to
have a DOCTYPE, and its absence causes problems for on-line
validation etc; the DOCTYPE must of course
come before the HEAD.
But, according to the HTML4.0 recommendation, it is also
mandatory to specify the document's charset,
in at least one of the available ways: the HTML4.0 recommendation
goes so far as to forbid client agents to assume a default
charset (even if browser designers tend to disregard that
mandate), and the alert CA-2000-02 cited above also gives
a motivation for authors to define this attribute.
There are two ways that are generally available for specifying
a charset: a META HTTP-EQUIV in the
HEAD, or a real HTTP header on the network
transaction.
Many people appear to be convinced that the only one of these
which is actually available to them is the
META HTTP-EQUIV, supposing that the other is
not accessible to them on the server that they use.
Well, of course I can't guarantee any particular case, but I can report that numerous people who have tried the following recipe, on the server that they use, have found to their surprise (and in some cases, to the surprise of their server admin!) that it works. Certainly, this is defined to work on Apache and NCSA, although it's possible for a server admin to enable or disable whether AddType directives are honoured in the .htaccess file. Well, if you don't try it, you'll never know.
"Why make do with an ersatz HTTP-EQUIV, when you could have a real HTTP header?".
In the .htaccess file of the relevant (or higher) subdirectory on the web server, place an entry such as the following:
AddType text/html;charset=iso-8859-1 html
This specifies that for file
extensions of html, the document will be sent out
with a modified Content-type header, with the charset
specified as shown.
This can be extended to other charset values
if you use different filename extensions according to the
desired charset value.
This form of the directive works even for antique versions of Apache version 1. Nowadays it's more customary to set the content-type and "charset" separately, as mentioned below.
You'll find examples of the above technique used
in my
charset Playground.
You may notice that the example given in the HTML recommendation has a space between the semicolon and the "charset", but this space isn't mandatory.
If in any doubt on server configuration issues such
as this, don't hesitate to consult the
excellent Apache
server documentation (should also be bundled with whichever
version of Apache you are using).
In current Apache versions you can control Content-type
e.g text/html separately from charset
(with AddType and AddCharset directives
respectively) if you prefer.
META...charsetIn May 2002, A.Prilop called my attention to a long-standing misbehaviour in Netscape versions up to and including 4.*, of which I had been unaware. According to his report, the following construct
<!-- <META HTTP-EQUIV="Content-type"
CONTENT="text/html; charset=something"> -->
resulted, in spite of the META element supposedly
being commented-out, in Netscape using the enclosed character
coding in rendering the page.
Reportedly this bug has been present since version 2.0.
I looked into this myself in some recent versions (4.7x) as well as
an older version (3.01) of Netscape, and found that the problem was
even more curious.
As A.P had reported, indeed the page was rendered according to the
commented-out META; confusingly, however, the
View->Page Info menu showed the document coding as "Unknown"
(on version 4.*; "(default)" on version 3.01), as if the commented-out
element had been correctly ignored.
Nevertheless, it was most definitely the case that this character
coding was being used to determine the page rendering
(there was no other possible cause for the 8-bit characters in the
test documents to be rendered as they were being).
The conclusion from this was that a META...charset
cannot be successfully commented-out by enclosing the whole element
in HTML comment markers.
What is, on the other hand, successful is to
turn the pointy-brackets of the element itself into HTML comments:
<!-- META HTTP-EQUIV="Content-type"
CONTENT="text/html; charset=something" -->
This was tested and proven to work.
Continuing on this theme of broken HTML parsing in Netscape versions up to and including 4.*, A.P reported having once found a web page containing the following construct:
<meta charset="ISO-8859-2">
and being surprised to find that it actually "worked" in Netscape. And again, when this incorrect construct was commented-out in this way:
<!-- <meta charset="ISO-8859-2"> -->
Netscape again used the commented-out character coding for rendering the 8-bit characters, in spite of pretending in its response to View->Page Info that the character coding of the page was unknown.
Last changed Friday, 03-Feb-2006 01:08:40 GMT
Original materials © Copyright 1994 - 2006 by A.J.Flavell & Glasgow University