Baltic languages are not my field, and most of the items
here are second-hand.
In Sept. 2000 we had a discussion about Baltic
characters, with particular reference to the Mac platform
and Lithuanian, on a German-language usenet group.
I was guided, as ever in Mac internationalization
issues, by Andreas Prilop.
I thought it useful to summarise some points from the
discussion here.
I'm doing this because of my interest in character codings and
related technologies: I have no particular expertise in the
languages themselves.
Pre-requisites: this note assumes an understanding of how character codings (are supposed to) work in HTML4. The note will likely make little sense to anyone who isn't reasonably up to speed on that. I've done my best to offer some suitable resources, and won't be repeating them here. Thanks.
Earlier Baltic 8-bit codings were iso-8859-4 and iso-8859-10. According to the W3C page the appropriate ISO choice would now be iso-8859-13, as Andreas agreed. iso-8859-13 is approximately equivalent to Windows-1257. Others confirm that in practice, Lithuanian web pages (if they claim any particular coding) predominantly claim to be in Windows-1257 coding. Note also the character database that was recommended by Jukka Korpela, where the required special characters can be researched, e.g for Lithuanian.
Mappings for these codings can be found at the Unicode
web site: iso-8859-13
and windows-1257
and here are my corresponding test pages, subject to the
explanation given for the "playground"
pages: iso-8859-13 and windows-1257.
(these will be sent out from the server with explicit
charset specification in the HTTP headers: see
comments on browser support in a moment).
Aside from the Windows character coding having displayable characters in the range 128-159 decimal as usual, which the iso-8859-* codings reserve for control characters, there are also four differences between the two codings in the range 160-255. (This is analogous to the situation in Greek between iso-8859-7 and windows-1253, where there are also a few differences; but is in contrast to iso-8859-1 versus windows-1252, which are identical in this range.)
Netscape 4 does not support iso-8859-13 coding; however, iso-8859-13 documents can be usefully viewed in later releases of this browser version if the reader uses the "View->Character Set" menu (which really selects character encoding) to choose windows-1257. A.Prilop reports that unix versions of Netscape 4 may not offer windows-1257 as an option (I'm unsure about the version history of the Netscape 4.* versions and just how closely the unix, Win, and Mac versions aligned in this regard).
Alternatively, Netscape 4 can handle utf-8 coding: this would make the document incompatible with earlier browsers, but by now (2005) that might not be considered critical. Despite the existence of support for Windows-1257 in some browser versions, it would be generally deprecated to advertise a proprietary coding on the WWW. If one confined one's usage to the common subset of the two codings, then one could advertise the resulting document as being in either coding, just as we did for the Greek version of the "Quickstart" document in regard to iso-8859-7 and Windows-1253.
If you are offering a utf-8-coded version of your document (or using what I describe as the conservative option), then offering it also in a version using an 8-bit character encoding can be beneficial not only to older browsers but also to some search engines: but search engine support for utf-8 is getting steadily better and soon (as of 2005) shouldn't be a problem.
Andreas points out that whereas the ISO codings are different for the "Baltic" area than for the "Central European" area, in the case of the Macs the coding for "Central European" is also used for the Baltic area, and therefore his Mac Central European resources page applies.
Andreas recommends Mac users to prepare documents using native MacCE codings, and then to use his conversion software to get a suitable WWW coding (be it 8-bit for older browser versions, or UTF-8 for more modern browsers). Again, keep in mind that although advertised under the "Central European" banner, these Mac resources are also applicable to Baltic.
Please review my page on this general topic first.
In this section, we're discussing the use of a regional font with a limited repertoire. Increasingly nowadays (2005), one uses comprehensive fonts (typically unicode-based) rather than the regional fonts that were usual in earlier times, and this part of the problem resolves itself. However, the discussion is kept here as a matter of interest.
Specifying an explicitly named regional font in this kind of situation is likely to be harmful. For example, to display Baltic characters in the Arial font family, the Windows-based user would likely be using "Arial Baltic", while the Mac user would need "Arial CE". Thus an author explicitly specifying the named font for the one platform would be harmful to the other: so the best they could hope for would be that the specification was unsuccessful!
Getting "i18n" right in a WWW situation is fraught enough as it is.
I definitely cannot recommend exacerbating the situation by trying to
impose a specific choice of font on your readers.
In general I would advise authors to refrain from use of the legacy
FONT FACE construction, and to refrain from, or at least
be very cautious in, the use of font name specifications in CSS.
Seem to be a particular problem, especially those developed predominantly in the USA, as some of them haven't the remotest clue how to deal with i18n issues, and produce the most preposterous drivel when used in non-Latin-1 situations. This is exacerbated by naive authors copy/pasting into composer windows out of word processors etc. whose method of handling internationalized content might be very different than what HTML needs.
I can't offer any specific practical advice here, sorry, but before starting a WWW project involving non-Latin-1 codings using these kind of tools, it would be advisable to ask some penetrating questions about your authoring application's ability to deal with such issues. And for word-processor documents it would probably be more productive to look for purpose-designed conversion software (again, after asking the penetrating questions). If anyone tells you that the solution involves the author setting a particular font, then they are imposters! (as the other documents in this area should make clear).
On the web will be found an unfortunately large number of legacy
non-Latin-1 documents which were prepared for older browsers and which
"take advantage" of bugs in those browsers to get the desired effect. A
commonly seen abuse is for the document to pretend to be in Latin-1
coding (or in no particular coding at all), intended to be used with
some specially-prepared non-standard font. The document then contains
designations such as è or
è, with the intention of displaying the
substitute characters which their font has in the corresponding
positions. This is absolutely wrong
and will not work on modern standards-conforming browsers.
As is explained in my page, there is a related technique that can be useful in extreme situations, where none of the available 8-bit codings is suitable for the purpose; but then the "user-defined coding" must be presented as real 8-bit characters, and not as Latin-1 HTML character entities or those numerical character references. (The correct numerical character references would be greater than 255, of course, but the whole point of this technique is to avoid presenting character references above 255 to legacy browsers.)
Andreas suggested a couple of sample links, which seemed basically OK despite the occasional anomaly. However, one of them subsequently disappeared. Readers should also be aware that there are some other web sites out there which recommend inappropriate or even perverse techniques (such as the ones mentioned in the previous section), that give an impression of working on older browsers but are increasingly failing as browsers move to support the published standards. It may be hoped that readers who have taken on-board the principles used in this part of the HTML specification, which I've tried to elucidate in the various pages in this area, will recognise these off-beam techniques when they see them.






Last changed Monday, 11-Jul-2005 19:19:01 BST
Original materials © Copyright 1994 - 2005 by A.J.Flavell & Glasgow University