Sumerian and Unicode (was: Kuraw and Unicode)

From:John Cowan <cowan@...>
Date:Saturday, October 11, 2003, 7:38
Paul Bennett scripsit:

> Speaking of things Unicode, does anybody in the know, know whether > Sumerian is (planned for inclusion) in Unicode? I assume it could be > included in very much the same way as Chinese, i.e. in some kind of > stroke-type count order. Since other exinct languages are in there > (e.g. Gothic) and since Sumerian is still very much a field of > ongoing study, I'm sure it would make a valid candidate.
It is planned, but it's going to be a large multi-year effort. The trouble is that Sumerian is a multimillennial script, and over that time characters split and merged and were invented and abandoned with great random. In general, Unicode encodes underlying emic forms, not etic shapes, but in Sumerian (and it's the same story with Egyptian, BTW), we don't have a 100% clear picture of which distinctions are etic and which are emic. In early writing, a distinction may be merely etic which later comes to be emic. And the reverse happens too. Once a character is in Unicode, it's in; it can be discouraged or deprecated, but not removed. (Korean syllables *were* removed, and the resulting fuss was more than enough to prevent that ever happening again.) So it's important to get these things right the first time. In addition, the pressure to actually encode the large morphosyllabic systems other than CJK really isn't there. There are no countries pushing for them, and scholars work exclusively in Latin transliteration anyhow. Old Persian cuneiform, which looks similar but has totally different principles (it's a sort of abugida where different letters have different implicit vowels, or it can be seen as an incomplete syllabary, or a mixture of syllabary and alphabet) will be in the next release, Unicode 4.1.