Re: character sets (was: ConGermanicRomanceLang?)
From: | John Cowan <jcowan@...> |
Date: | Tuesday, December 12, 2000, 15:03 |
Raymond Brown wrote:
>>> Even tho Welsh has had official status as a national language for some time
>>> now?
>>
>> Probably not when 8859-1 was devised.
>
> When was that?
1985, as it turns out, so post-Mac. I don't know if MacRoman
sprung fully into life in 1984 or if it developed over time,
though.
>> In any case, Welsh is
>> not an official language of the U.K., only of Wales, if I
>> understand correctly.
>
> But Wales is part of the UK. It has equal status with English in Wales,
> and even in England it has been given some legal status for Welsh speakers.
> The language does have an official status within the UK, as no other
> minority language has.
Spanish and English are both official languages of the
State of New Mexico, but neither is an official language
of the United States. However, it is fair to say that
English is the de facto language of the U.S. government,
and Spanish is not.
>> 8859-14 (aka Latin-8) handles all
>> the Celtic languages, and is compatible with 8859-1 as
>> far as letters are concerned (the accented w's and y's, and
>> the dotted b, c, d, f, g, m, p, and s are squeezed in as
>> replacements for most non-ASCII symbol characters).
>
> Yep - I've never understood why most early versions of "extended ASCII" had
> symbols for y-diaeresis (not exactly common) but none for {y} with more
> common discritics.
Because in Dutch it is a glyph variant of "ij", and it got
support from French, where it occasionally occurs. In order
to fit in the lowercase-only German sharp-s, the uppercase
y-diaeresis marginally needed in French in uppercase-only
text and not at all in Dutch was dropped from 8859-1.
After Latin-1 (8859-1) was devised in 1985, it was quickly
paralleled by Latin-2,3,4 (8859-2,3,4), which were intended
to be regional equivalents: roughly speaking West, East,
South, and North respectively. However, regionalism did
not turn out to be very successful, as it cut across
trade connections and relative-importance relationships.
Danes, e.g., wanted the Latin-1 charset that the rest of
Western Europe used, not a Latin-4 charset that could handle
Greenlandic too.
The Turks rebelled directly, using not Latin-3 as the rules
suggested, but a variant of Latin-1 with the Icelandic letters
replaced by Turkish ones. (Latin-3 is now primarily used by
Esperantists; it was designed to handle Turkish, Esperanto,
and Maltese.) Turkish Latin-1 eventually got standardized
as Latin-5 (8859-9).
With the further dominance of Latin-1 in the Web and its
status as a subset of Western European Windows character
set CP1252 (whereas Latin-2 is *not* a subset of the
Central European Windows charset CP1251), later charsets
tended to be variants of Latin-1 with the less useful
characters replaced by locally needed letters.
Hence Latin-6 (8859-10) and its replacement Latin-7 (8859-13)
are moderately Latin-1 compatible and are meant to supersede
Latin-4 for the Baltic languages that Latin-1 can't
handle, while handling the Nordic languages (including Saami)
as well; Latin-8 (8859-14) is the Celtic charset I discussed
earlier; Latin-9 (8859-15) is a direct replacement for
Latin-1 including the euro symbol and some less-used
French and Finnish characters in place of rarely used symbols;
the forthcoming Latin-10 (8859-16) will be Latin-1 compatible
for letters, and will replace almost all the symbols with
letters from the Latin-2 repertoire.
> AFAIK only Welsh puts a circumflex on top of {w} :)
Almost certainly.
--
There is / one art || John Cowan <jcowan@...>
no more / no less || http://www.reutershealth.com
to do / all things || http://www.ccil.org/~cowan
with art- / lessness \\ -- Piet Hein