Re: Digraphic letters (was: Dutch "ij")
From: | John Cowan <jcowan@...> |
Date: | Monday, July 22, 2002, 12:39 |
Ray Brown scripsit:
> How would you define 'character'?
Ah, *that* is so hard that I fear I'm unable! (Lewis Carroll)
Thus speaks the Unicode Standard:
(1) The smallest component of written language that has
semantic value; refers to the abstract meaning and/or shape,
rather than a specific shape (see also glyph), though in
code tables some form of visual representation is essential
for the reader's understanding. (2) A unit of information
used for the organization, control, or representation of
textual data.
But in fact I think that characters are like points in Euclid: truly
undefined, and having some sort of definition (Euclid's was "that which
has no part") only in order to satisfy the desire to define everything.
> But I was commenting here on the term 'grapheme' which some people use. I
> am never quite certain what they regard as the 'smallest/basic unit of
> writing' is. I do recall somewhere an argument whether lower case {i}
> was one or two graphemes.
>
> Also in the "grapheme" terminolgy, the various form of the character {a},
> including its upper case variant, are termed "allographs". It appears
> from what you say above, that we would say an 'allograph' is a variant of
> a grapheme with its own distinctive glyph. Or am I going wildly astray?
>
> I would greatly appreciate your definitions of these terms.
I think the point to be made about graphemes is that, like phonemes,
they are defined with respect to a particular orthographical convention.
"b" and "d" are distinct graphemes for the same reason that [b] and [d]
are in English: it's easy to find minimal pairs. With a little work,
we can find minimal pairs for "p" and "P" as well: [Pp]olish, e.g.
Thus the question of whether the dot on the "i" in Turkish is a separate
grapheme is resolved by Occam's Razor: we gain nothing by abstracting
it away, since either there are two graphemes "i" and dotless-i, or
two graphemes dotless-i and dot. Better then to stick to the overt
level and recognize i and dotless-i.
> But if we can say that a-e ligature is a single letter in one language,
> but a ligature of two letters in another, then it seems to me that we
> can also say that, e.g. {ch} are two separate letters in English but a
> single composite letter in Welsh & Spanish.
Fair enough.
--
Schlingt dreifach einen Kreis vom dies! || John Cowan <jcowan@...>
Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com
Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)
Replies