TECH: Unicode vs The Rest Of The World (Again)
From: | Danny Wier <dawiertx@...> |
Date: | Friday, April 30, 2004, 21:12 |
From: "Paul Bennett" <paul-bennett@...>
> I think it is the only way (certainly for long s). Unfortunately, many
> people around here haven't yet bothered to get Unicode mail clients
> (including the very small number for whom this is a still technical
> impossibility for one reason or another). It's quite an amusing situation
> that so many linguists would deny themselves the biggest boon to online
> linguistics since the invention of e-mail, IMO.
>
> More worrying than the mere adoption rate is that the List Server itself
> is **severely** broken when it comes to UTF-8 (and presumably any other
> full 8-bit encoding). It takes byte values (inside message bodies, I don't
> know about inside attachments) 128 thru 149 and subtracts 128 from them,
> leaving you with multi-byte UTF sequences that at best point to the wrong
> character and at worst form a broken character that is unprintable.
Welcome to the Unicode Empire. Resistance is futile. ;)
Seriously, a good policy on Unicode (and non-ASCII encodings in general) I
propose:
1) Give a spoiler warning at the top of your post or in the Subject: line
saying "Warning: Unicode" or something like that.
2) Only use Unicode when necessary if you need to use a character outside of
Latin-1; try to stick to the WGL4 character set if possible. Otherwise, if
you get a Unicode-encoded message from CONLANG and reply to the list,
convert to ISO or Windows Western European before you send.
3) Hebrew, Arabic, Hangul and Chinese-Japanese-Korean characters are okay,
but don't expect everyone to be able to read them. We don't all have Windows
2000/XP.
4) Offer an X-SAMPA alternative, ESPECIALLY if you use anything in the IPA
area of Unicode.
5) Don't use any non-ASCII at all in the Subject: line.
Also, a few folks here aren't even able to read anything beyond ASCII, even
8-bit Latin-1.