|From:||Mark J. Reed <markjreed@...>|
|Date:||Tuesday, March 27, 2007, 11:51|
We're talking about Unicode, which is based on the idea that abstract
characters exist. Regardless of your opinion on that point, the
decision was made long ago. Given that framework, then, are c with
comma below and c with cedilla alloglyphs?
As for the three A's, they come from three separate alphabets. It
would be quite odd to mix alphabets within a single word. More
importantly, you would lose the ability to map each letter to its
lowercase equivalent. Or rather, you would lose the simple case
mapping and make every letter language-sensitive, essentially
repeating the Turkish I vs dotted I case situation across the board.
Also remember that Unicode is a compromise developed by a consortium.
Certain decisions were made politically rather than technically, in
order to promote its adoption. A ubiquitous inelegant standard beats
an elegant hardly-used one, or no standard at all. :)
On 3/27/07, John Vertical <johnvertical@...> wrote:
Mark J. Reed <markjreed@...>
> >On 3/23/07, Benct Philip Jonsson <conlang@...> wrote:
> > > Unicode doesn't distinguish very clearly between cedilla and
> > > comma below. The canonical shape used in Latvian and
> > > Rumanian is comma, while the Turkish is cedilla. The
> > > confusing names are a holdover from a time when one thought
> > > the Turkish and Romanian forms could be considered variants
> > > of one another.
> >Why shouldn't they be? Are they not in complementary distribution (one
> >in Turkish, the other in Latvian and Romanian), for starters?
> Many glyphs, even with similar meanings, are in complementary distribution -
> I don't see how that is enuff grounds to consider them alloglyphs. How
> about, say, tilde over vs. ogonek?
> >And besides, the actual shape of cedilla can vary... I've seen an
> >Albanian write the ç in her name so that it looked like a lower-case c
> >with an inverted hacek (or a circumflex) below, for example.
> All's fair in handwriting. A few of my professors write kappa as identical
> in form to the lower-case ae digraph.
> >At any rate, I've always taken the variants with comma below and with
> >cedilla below to be a glyph issue: alloglyphs of the same diacritic...
> >a bit like the apostrophe-after vs. caron-above issue with letters Dd
> >and Tt (cf. Ďď, Ťť).
> Yeah, that's pretty much an equal case.
> >Can you say why you think they cannot be considered glyph variants of
> >the same abstract diacritic?
> Because they have clearly different shapes & "abstract diacritics" do not
> exist? Okay, maybe not all that different... But they're even historically
> separate, aren't they?
> >Compare also ó, where the accent "should" have a different slope
> >depending on whether you're writing Spanish or Polish (see
> >example, where "acute" and "kreska" are constrasted).
> And an umlaut "should" differ from a diareses both by location and by cursiv
> form, etc. This is where standardization comes in. I wouldn't actually mind
> seeing the cedilla and comma belo merged, but at least fonts should then be
> consistent about which they display. (And then you would have a pretty good
> case for considering them alloglyphs.)
> >Is this case not similar? The same abstract character with two
> >different, language-specific, glyph realisations?
> >Philip Newton <philip.newton@...>
> IMO "abstract character", I mean any more abstract than the basic
> geometrical structure, is an oxymoron. Is it abstract, or is it written?
> Frex how do we determine which representations of /S/ are the same "abstract
> character"? Is s-caron also one? Esh? <sh>?
> (OTOH I also don't see how, for example, A differs from Α (Greek)
> differs from А (Cyrillic).)
> John Vertical
> Windows Live Messenger - kivuttoman viestinnän puolestapuhuja.