Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: X-SAMPA { and }

From:Jörg Rhiemeier <joerg.rhiemeier@...>
Date:Thursday, November 8, 2001, 23:23
Lars Henrik Mathiesen <thorinn@...> writes:

> > Date: Thu, 8 Nov 2001 01:38:31 +0100 > > From: =?iso-8859-1?Q?J=F6rg?= Rhiemeier <joerg.rhiemeier@...> > > > > Lars Henrik Mathiesen <thorinn@...> writes: > > > > > I'm not talking about using another system if X-SAMPA doesn't suit > > > your needs --- noone can object to that. > > > > No. For example, I find that X-SAMPA doesn't suit my needs when I am > > going to present phonological data in e-mails (whether in CONLANG or > > in private communication with friends) or on web pages, and thus > > I don't use it. After all, this application is AFAIK not what > > X-SAMPA was made for anyway. > > Quoting from <URL:http://www.phon.ucl.ac.uk/home/sampa/x-sampa.htm>: > > Using these codes, you can for example include IPA-phonetic > transcriptions of all kinds in e-mail messages or other forms of > electronic exchange. Wherever an IPA character set is not > available, X-SAMPA will provide a workable alternative. > > Straight from the keyboard of the designer. (It's not like that page > is hard to locate, it's the first one Google finds when you search for > X-SAMPA).
Well, the X-SAMPA page does not say that it is meant to be translated to and from IPA by machine, but the SAMPA page gives this impression. I have always seen SAMPA as a "machine-readable" rather than "human-readable" encoding of IPA, useful to transmit IPA via e-mail, and possibly also as a way to type IPA characters in an IPA editor. But that is, I repeat, only my impression what the designers have meant it for. And anyway, to quote the chief engineer in the film _Apollo 13_, "I don't want to know what something is made for, I want to know what it can do!"
> > X-SAMPA is intended to be converted into actual IPA *automatically*, > > or so I am told. > > It would be quite simple to do so, since the designer has in fact > understood how to make a easily parsable code. (There's a nit or two, > like the _ marking either diacritics or a tie bar, but there's no > guesswork involved). > > But there doesn't seem to be anything out there to do so now.
I haven't seen a converter yet, either. And that counts, if anything, *against* X-SAMPA. It is designed to be machine-readable, not human-readable, but no-one has seen an X-SAMPA reading machine (i.e., a conversion program) yet. Bummer.
> Once I get something installed on my home system (FreeBSD 4.3) that > will actually attempt to place random Unicode diacritics on IPA base > characters, I fully intend to code up a perl script to convert between > X-SAMPA and Unicode IPA. > > Which is one reason I'm arguing so hard for using the standard version > of X-SAMPA: I don't want people complaining to me that they get small > caps OE's instead of æ's when they run someone else's pseudo-X-SAMPA > through my converter.
You can do what you want, dumb people will try to feed just about everything that's not X-SAMPA into your X-SAMPA converter and complain that the result is garbage. You can't stop them. That's the GIGO principle, known as long as there have been computers. Or, as someone once said, "computers are dumb, but sometimes, people are dumber".
> > I want an intuitive, easy-to-read (by *humans*, not conversion > > software) system, which X-SAMPA is not.
Of course, it is an advantage if it is easy to convert automatically, *ceteris paribus*. But that shouldn't be hard to arrange. All this requires is the system to be unambiguous; no need to pick non-intuitive character shapes.
> I don't think you'll get an intuitive system for phonetic notation. > Even real IPA isn't intuitive. For phonemic, what's wrong with SAMPA?
When I say "intuitive", I mean that the first impression it gives when seeing it should not be too far off the actual meaning of the symbols. Most IPA characters do resemble well-known Roman (or Greek) letters, and their phonetic value is usually not too far away from the "usual" value of the letter the IPA symbol resembles most. Of course you still need to know the IPA chart in order to interpret it correctly, but you get a meaningful impression from the first look. You instantly get some vague idea of how the sample might sound like, and you won't be too far off. This is not the case with X-SAMPA! Many, many X-SAMPA symbols don't look like letters at all but like random dingbats, and the first impression is that of line noise. To take just a single example: the IPA barred-u looks similar to an u, and one might guess that it represents some kind of u-like vowel even without checking the IPA chart, and you are right. It is a close central rounded vowel, halfways between /u/ and /y/. But the X-SAMPA character [}] tells the reader *nothing* on the intuitive level. You instinctively start looking for the matching opening bracket. Either you find it (because there is a near-open front unrounded vowel somewhere to the left of it), in which case it won't help, or not. Why not choose something that looks more u-like, such as [*u]? X-SAMPA would spell my first name phonemically as /j2rg/, phonetically as [j9`g_0]; you might guess that the digits represent some kind of vowels, but which? Using a system I have concocted by myself, I would write /j"org/ and [j"O^rg_h], and you could guess that the vowel is somewhat "o-like", or, of you know that quite a number of languages use an "o" with two dots on it for a mid front rounded vowel, that the same is meant here, and you will be right. I call this misleading and distracting. This lack of intuituiveness is fine for a conversion program that has no sense of intuition anyway, but it makes it much harder to read for humans. Of course, some other ASCII-IPA schemes have the same kind of problems. The Kirschenbaum choice of [K] for a voiceless lateral fricative (or something like that) is misleading, no doubt; intuition would tell you that it must be some kind of dorsal obstruent. And that's just one example.
> > > I'm talking about people who find that X-SAMPA is exactly what they > > > need, except that it would look so much nicer if we just made this > > > teeny little change, the use of which will of course be intuitively > > > obvious to everyone seeing the transcription. > > > > If people say they use X-SAMPA, they should strictly follow the > > X-SAMPA standard. But if they don't say they use X-SAMPA, they can do > > what they want! (As long whoever the message is meant for, still > > understands what the author means.) > > > And if someone thinks that he doesn't like the way X-SAMPA does this > > or that, he is free to change it, only that it is no longer X-SAMPA > > then. But that doesn't hurt. > > Well, my argument is that it does hurt, by confusing readers and > making it harder to collect data in consistent notations.
I admit that it would be easier if everybody used the same encoding, but must we pick one of the most ugly and least human-readable schemes in circulation? Every system has its pros and cons; new systems will be designed in the future. Well, it is quite the same problem as with IALs.
> > While X-SAMPA is *a* standard, it is (fortunately, IMHO) not *the* > > standard on CONLANG. > > I have to admit that the proportion of people using SAMPA or X-SAMPA > is now large enough that I've stopped trying to remember any other > system. With the amount of traffic on the list now, any data given in > Kirschenbaum or whatever provokes a why-bother reaction.
That's not a good reaction. It is pretty much like, "I can design my pages exclusively for the MicroSSoft Internet Exploder and put Word documents on the web with impunity. Of course the Linux/Unix/BSD users won't be able to read them, but they are less than 10%; most people use Windoze anyway, so why bother". Of course, there is a difference because the X-SAMPA people don't want to conquer the world (at least not as far as I know). They just suggested a solution for a particular problem. And I don't reject X-SAMPA because I find it (or its creators) "evil"; I am just of the opinion that it has a number of shortcomings that could have been avoided. But people whose posts are ignored (or even flamed against) merely because they use a different encoding scheme won't be happy with that, no matter how nice the X-SAMPA designers are. Actually, I haven't really seen much X-SAMPA yet on CONLANG, unless you count everything as X-SAMPA where only those symbols occur that are the same in all ASCII-IPA schemes (which are quite many, including the most commonly used ones). Sure, "/a elbereT gilTo:niel/" *is* X-SAMPA, but it is also Kirschenbaum, KPA, or whatever, and you can't tell which system the author had in mind unless symbols unique to one system occur. And it also seems to me that those who find that certain X-SAMPA symbols are poorly chosen ("{" and "}" are on the top of the list, no doubt) by far outnumber those who actually, and consistently, use X-SAMPA. If one was to conduct a vote in CONLANG whether X-SAMPA should be adopted as "official CONLANG standard", I am sure the NO votes would outnumber the YES votes. Of course, CONLANG doesn't stand alone in the world, there are many, many other people in need of encoding IPA in ASCII; but even in this greater world, X-SAMPA is far from being the universally established standard. The plurality of ASCII-IPA encoding schemes is a fact, and will probably remain so as long as we need such encodings. I'd guess that people won't settle on X-SAMPA (or anything else) as a truly universal standard (whether de facto or de jure) within the next 10 or 15 years - and by then, everybody will have Unicode and we can all happily use actual IPA and throw all those ASCII-IPA schemes away.
> For data in X-SAMPA-with-random-replacements, that reaction is only > slightly delayed.
That's one reason why I don't like "X-SAMPA-with-random-replacements". The other is that there is IMHO so much to change in X-SAMPA to make it suitable for human-readable e-mail use that it is a better idea to come up with an entirely new system. And isn't the fact that there are so many "dialects" of X-SAMPA a strong indicator that it *doesn't* satisfy the customers? Jörg.