Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: [wEr\ Ar\ ju: fr6m] ?

From:John Cowan <jcowan@...>
Date:Friday, November 9, 2001, 18:58
Lars Henrik Mathiesen wrote:


> Ah, I forgot that we have an expert on Unicode on board. There's a few > little things I'd like to know, John --- you can answer in private > email if you want to spare the rest of the list:
I'll reply publicly so that the answers get archived, since others may well care in future.
> If the contour symbols combine, they aren't a problem --- I'll just > pretend that _R is _B_T, and it all happens magically. Is the same > true for the accent signs --- i.e., are U+0301 U+030B supposed to > ligature into a high-rising diacritic?
In general, no; they should stack vertically, the 0301 below the 030B. However, it is worth mentioning that the details of ligaturing behavior depend on the font. A minimalistic Arabic font can get away with just one ligature, lam-alef; a decent Arabic or Indic font will have hundreds of ligatures. Vietnamese Latin fonts automatically ligature certain combining characters so that they aren't stacked. The only Unicode-level controls over ligatures are that ZWNJ always breaks up ligatures, and ZWJ (as of Unicode 3.1) encourages their formation where that would not otherwise be the case. For example, most Latin fonts don't ligature ct automatically, but a Unicode 3.1 compliant version could transform c ZWJ t into a ligature if it had one.
> Is the mark for pharyngealized supposed to be U+02C1 or U+02E4? It's > hard to see how far over the base line these things are...
U+02C1 is the pharyngealization diacritic. U+02E4 is a superscript U+0295 (reversed glottal stop); I'm not sure what it's used for. The Unicode book refers vaguely to "1989 IPA".
> There don't seem to be characters with glyphs like the one John Wells > shows for upstep and downstep (i.e., superscript up and down arrows). > I noticed that U+2191 and U+2193 in the Arrows section are annotated > as having a different IPA semantics.
In general (though not always in IPA) subscripts and superscripts are encoded only for compatibility. In this case, I would use markup to incidate superscriptness, and the regular arrow characters.
> My current best display technology is MS Word 2000 with the font Arial > Unicode MS (which doesn't combine tone contours, I just tested)
With a font editor you could add appropriate glyphs and ligature table entries.
> I haven't found anything on UNIX that even attempts to place combining > diacritics correctly.
It doesn't seem like X font rendering can cope, even using TrueType fonts. Maybe soon.
> So Word is the only way I've been able to view the U+0361, COMBINING > DOUBLE INVERTED BREVE, which I want to use for a tie bar. However, it > shows it as connecting the two preceding glyphs, whereas the PDF files > from unicode.org seem to indicate that it should connect the glyphs > around it. Who's right, me or Bill?
Technically, it attaches to the character before it, and just physically hangs over the one following it. You might have better luck with the compatibility half characters.
> Finally, it seems that the IPA chart wants the diacritic for no > audible release to be a spacing modifier letter --- but Unicode only > has the combining diacritic U+0321 with that sense.
I presume you mean U+031A: U+0321 is the combining palatal hook. BTW, Unicode considers IPA's treatment of the rhotic and palatal hooks to be a mistake, since they are almost always rendered physically attached to the character. So rather than ligaturing a separate spacing modifier letter, they are defined in Unicode as combining characters.
> I could use U+321D > instead, but I'm not sure it would necessarily be designed to fit the > role. (It's a quine mark, whatever that is).
Those are the marks you see on proofs of books, or on photographs, to show where the final product will be cropped to.
> (U+0020 U+0321 is another possibility).
Probably best.
> (I'm resigned to having to pick _n (and _0, _4, ...) out of the Super- > and Subscripts block, and _1, _2, _3 from Latin-1, but there's the > same caveat about design mismatch with the specific IPA superscript > letters).
No general-purpose font like AU can serve all purposes equally. Hell, you shouldn't use the same O WITH ACUTE glyph for both Spanish and Polish -- the typical compromise isn't steeply inclined enough for Polish. -- Not to perambulate || John Cowan <jcowan@...> the corridors || http://www.reutershealth.com during the hours of repose || http://www.ccil.org/~cowan in the boots of ascension. \\ Sign in Austrian ski-resort hotel

Reply

Lars Henrik Mathiesen <thorinn@...>