Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: character sets (was: ConGermanicRomanceLang?)

From:John Cowan <jcowan@...>
Date:Tuesday, December 12, 2000, 15:03
Raymond Brown wrote:

>>> Even tho Welsh has had official status as a national language for some time >>> now? >> >> Probably not when 8859-1 was devised. > > When was that?
1985, as it turns out, so post-Mac. I don't know if MacRoman sprung fully into life in 1984 or if it developed over time, though.
>> In any case, Welsh is >> not an official language of the U.K., only of Wales, if I >> understand correctly. > > But Wales is part of the UK. It has equal status with English in Wales, > and even in England it has been given some legal status for Welsh speakers. > The language does have an official status within the UK, as no other > minority language has.
Spanish and English are both official languages of the State of New Mexico, but neither is an official language of the United States. However, it is fair to say that English is the de facto language of the U.S. government, and Spanish is not.
>> 8859-14 (aka Latin-8) handles all >> the Celtic languages, and is compatible with 8859-1 as >> far as letters are concerned (the accented w's and y's, and >> the dotted b, c, d, f, g, m, p, and s are squeezed in as >> replacements for most non-ASCII symbol characters). > > Yep - I've never understood why most early versions of "extended ASCII" had > symbols for y-diaeresis (not exactly common) but none for {y} with more > common discritics.
Because in Dutch it is a glyph variant of "ij", and it got support from French, where it occasionally occurs. In order to fit in the lowercase-only German sharp-s, the uppercase y-diaeresis marginally needed in French in uppercase-only text and not at all in Dutch was dropped from 8859-1. After Latin-1 (8859-1) was devised in 1985, it was quickly paralleled by Latin-2,3,4 (8859-2,3,4), which were intended to be regional equivalents: roughly speaking West, East, South, and North respectively. However, regionalism did not turn out to be very successful, as it cut across trade connections and relative-importance relationships. Danes, e.g., wanted the Latin-1 charset that the rest of Western Europe used, not a Latin-4 charset that could handle Greenlandic too. The Turks rebelled directly, using not Latin-3 as the rules suggested, but a variant of Latin-1 with the Icelandic letters replaced by Turkish ones. (Latin-3 is now primarily used by Esperantists; it was designed to handle Turkish, Esperanto, and Maltese.) Turkish Latin-1 eventually got standardized as Latin-5 (8859-9). With the further dominance of Latin-1 in the Web and its status as a subset of Western European Windows character set CP1252 (whereas Latin-2 is *not* a subset of the Central European Windows charset CP1251), later charsets tended to be variants of Latin-1 with the less useful characters replaced by locally needed letters. Hence Latin-6 (8859-10) and its replacement Latin-7 (8859-13) are moderately Latin-1 compatible and are meant to supersede Latin-4 for the Baltic languages that Latin-1 can't handle, while handling the Nordic languages (including Saami) as well; Latin-8 (8859-14) is the Celtic charset I discussed earlier; Latin-9 (8859-15) is a direct replacement for Latin-1 including the euro symbol and some less-used French and Finnish characters in place of rarely used symbols; the forthcoming Latin-10 (8859-16) will be Latin-1 compatible for letters, and will replace almost all the symbols with letters from the Latin-2 repertoire.
> AFAIK only Welsh puts a circumflex on top of {w} :)
Almost certainly. -- There is / one art || John Cowan <jcowan@...> no more / no less || to do / all things || with art- / lessness \\ -- Piet Hein