Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Digraphic letters (was: Dutch "ij")

From:John Cowan <jcowan@...>
Date:Monday, July 22, 2002, 12:39
Ray Brown scripsit:

> How would you define 'character'?
Ah, *that* is so hard that I fear I'm unable! (Lewis Carroll) Thus speaks the Unicode Standard: (1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than a specific shape (see also glyph), though in code tables some form of visual representation is essential for the reader's understanding. (2) A unit of information used for the organization, control, or representation of textual data. But in fact I think that characters are like points in Euclid: truly undefined, and having some sort of definition (Euclid's was "that which has no part") only in order to satisfy the desire to define everything.
> But I was commenting here on the term 'grapheme' which some people use. I > am never quite certain what they regard as the 'smallest/basic unit of > writing' is. I do recall somewhere an argument whether lower case {i} > was one or two graphemes. > > Also in the "grapheme" terminolgy, the various form of the character {a}, > including its upper case variant, are termed "allographs". It appears > from what you say above, that we would say an 'allograph' is a variant of > a grapheme with its own distinctive glyph. Or am I going wildly astray? > > I would greatly appreciate your definitions of these terms.
I think the point to be made about graphemes is that, like phonemes, they are defined with respect to a particular orthographical convention. "b" and "d" are distinct graphemes for the same reason that [b] and [d] are in English: it's easy to find minimal pairs. With a little work, we can find minimal pairs for "p" and "P" as well: [Pp]olish, e.g. Thus the question of whether the dot on the "i" in Turkish is a separate grapheme is resolved by Occam's Razor: we gain nothing by abstracting it away, since either there are two graphemes "i" and dotless-i, or two graphemes dotless-i and dot. Better then to stick to the overt level and recognize i and dotless-i.
> But if we can say that a-e ligature is a single letter in one language, > but a ligature of two letters in another, then it seems to me that we > can also say that, e.g. {ch} are two separate letters in English but a > single composite letter in Welsh & Spanish.
Fair enough. -- Schlingt dreifach einen Kreis vom dies! || John Cowan <jcowan@...> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)

Replies

bnathyuw <bnathyuw@...>
Ray Brown <ray.brown@...>Back again
Christophe Grandsire <christophe.grandsire@...>Back again