Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Phonetics

From:John Vertical <johnvertical@...>
Date:Saturday, April 14, 2007, 14:48
Mark:
>I think you are overcomplicating what is really a simple concept. > >Forget Unicode. When English-speakers say the alphabet has 26 letters, >what are they counting? Not glyphs. Certainly not bit patterns. >Those are abstract characters. > >You are, I think, interpreting "glyph" too broadly. In Unicode terms, a >glyph is A SINGLE PARTICULAR GRAPHICAL REPRESENTATION OF >A CHARACTER. There are zillions of glyphs for Latin small letter a, but it >is still one character. If you put a macron over it, that's still one >character, >even though there are different Unicode code points you could use to >construct it.
I think I have implied over the discussion that I'm not familiar with the details of the Unicode terminology. I have indeed been using "glyph" in a wider sense than just that, and it would seem that we largely agree if that's the meaning you've been using... But see more belo. Joseph:
>So would you say that Latin, Cyrillic, Greek, and Coptic "A" are the same >abstract character or three different ones? Latin and Futhark "R"? >Aramaic and Latin "L"? I think the answer to these questions might go a >long way towards defining the terms we're using here. > >Then once we have an answer to that, what about Latin and Cherokee "M"? >They have the same shape, but totally different sound values, one being >/m/, the other being /lu/, but historically they originate from the same >character.
IMO all of those are still the same "glyphs". Okay, let's say "letter shapes" insted. For example, "A" is the letter shape that consists of two lines that meet at the top and a horizontal line between them. But notice that this is still, despite allowing for different particular written forms, still a definition that's grounded in the actual appearence of the letter. (So it is certainly abstract in the sense that it's not a physical entity, but it's not abstract in the sense of not being even a possible property of physical entities.) One of my arguments here was that, if the letter shape of "a with macron" can be encoded in multiple ways with Unicode, then surely that is a situation equal to how "A" can be encoded in multiple ways (be it in Latin, Greek, Cyrillic etc.) I do agree that Unicode has different names for all the different A's, but then again, "LATIN SMALL LETTER A WITH MACRON" is not the same string as "LATIN SMALL LETTER A folloed by COMBINING MACRON" either. I'm still not sure whether there exists some level of coding, in the font renderer or thereabout, where the latter two will be necessarily treated equally but the A's still kept distinct. Anyway, rendered with this newly-clarified terminology, my original argument and a few more: * C with cedilla and C with comma belo are not the same letter shape. * They may or may not have the same encoding (be it the binary data, the Unicode-point, or some hi'er level of encoding.) * There is no _universal_ connection between letter shapes and encodings (see: the dodgily made dingbats font example), tho there naturally IS a connection that specifies what letter shape a giv'n encoding SHOULD (and usually does) produce. I'm not sure what name would be good for that connection specifically. "Standard shape"? * The word "caracter" or "letter" is an umbrella term that can mean either the letter shape, the encoding, or its meaning. Or a combination of some of these. As I demonstrated before, it cannot be used in a sense that's not based on any of these, and equating two different senses makes as much sense as equating two different senses of any homophone.
>Actually, Cherokee throws in a different problem. The Cherokee alphabet >includes all but 4 of the characters of the Roman alphabet, then adds a >bunch of characters unique to Cherokee. One could describe Icelandic (for >example) in the same way, using most of the Latin characters, then adding >some unique to that language, pronouncing nearly all of the characters >different than say, English. What objective criteria should we use to call >Cherokee a separate alphabet, while keeping Icelandic under the Roman >alphabet? I think they're clearly separate, I'm just wondering what other >people think should be the distinguishing criteria.
Again IMO, that would be on the basis of how the letter shapes correspond to sounds. And that's plural "shapes", something that can only be considered in the framework of the whole alphabet, not with individual letters. If you want to consider languages with non-identical orthography to have the same alphabet however, I think it will necessarily require a partially arbitrary definition of "alphabet". (The Unicode encoding based arguments being a sub-category of this.) (Sorry to still go on about this...) John vertical _________________________________________________________________ Ota käyttöön Windows Live Messenger ja sano kyllä kivuttomalle viestinnälle! http://www.communicationevolved.com/fi-fi/

Reply

Mark J. Reed <markjreed@...>