Re: X-SAMPA { and }
From: | Jörg Rhiemeier <joerg.rhiemeier@...> |
Date: | Thursday, November 8, 2001, 23:23 |
Lars Henrik Mathiesen <thorinn@...> writes:
> > Date: Thu, 8 Nov 2001 01:38:31 +0100
> > From: =?iso-8859-1?Q?J=F6rg?= Rhiemeier <joerg.rhiemeier@...>
> >
> > Lars Henrik Mathiesen <thorinn@...> writes:
> >
> > > I'm not talking about using another system if X-SAMPA doesn't suit
> > > your needs --- noone can object to that.
> >
> > No. For example, I find that X-SAMPA doesn't suit my needs when I am
> > going to present phonological data in e-mails (whether in CONLANG or
> > in private communication with friends) or on web pages, and thus
> > I don't use it. After all, this application is AFAIK not what
> > X-SAMPA was made for anyway.
>
> Quoting from <URL:
http://www.phon.ucl.ac.uk/home/sampa/x-sampa.htm>:
>
> Using these codes, you can for example include IPA-phonetic
> transcriptions of all kinds in e-mail messages or other forms of
> electronic exchange. Wherever an IPA character set is not
> available, X-SAMPA will provide a workable alternative.
>
> Straight from the keyboard of the designer. (It's not like that page
> is hard to locate, it's the first one Google finds when you search for
> X-SAMPA).
Well, the X-SAMPA page does not say that it is meant to be translated
to and from IPA by machine, but the SAMPA page gives this impression.
I have always seen SAMPA as a "machine-readable" rather than
"human-readable" encoding of IPA, useful to transmit IPA via e-mail,
and possibly also as a way to type IPA characters in an IPA editor.
But that is, I repeat, only my impression what the designers have
meant it for. And anyway, to quote the chief engineer in the film
_Apollo 13_, "I don't want to know what something is made for,
I want to know what it can do!"
> > X-SAMPA is intended to be converted into actual IPA *automatically*,
> > or so I am told.
>
> It would be quite simple to do so, since the designer has in fact
> understood how to make a easily parsable code. (There's a nit or two,
> like the _ marking either diacritics or a tie bar, but there's no
> guesswork involved).
>
> But there doesn't seem to be anything out there to do so now.
I haven't seen a converter yet, either. And that counts, if anything,
*against* X-SAMPA. It is designed to be machine-readable, not
human-readable, but no-one has seen an X-SAMPA reading machine
(i.e., a conversion program) yet. Bummer.
> Once I get something installed on my home system (FreeBSD 4.3) that
> will actually attempt to place random Unicode diacritics on IPA base
> characters, I fully intend to code up a perl script to convert between
> X-SAMPA and Unicode IPA.
>
> Which is one reason I'm arguing so hard for using the standard version
> of X-SAMPA: I don't want people complaining to me that they get small
> caps OE's instead of æ's when they run someone else's pseudo-X-SAMPA
> through my converter.
You can do what you want, dumb people will try to feed just about
everything that's not X-SAMPA into your X-SAMPA converter and complain
that the result is garbage. You can't stop them. That's the GIGO
principle, known as long as there have been computers. Or, as someone
once said, "computers are dumb, but sometimes, people are dumber".
> > I want an intuitive, easy-to-read (by *humans*, not conversion
> > software) system, which X-SAMPA is not.
Of course, it is an advantage if it is easy to convert automatically,
*ceteris paribus*. But that shouldn't be hard to arrange. All this
requires is the system to be unambiguous; no need to pick
non-intuitive character shapes.
> I don't think you'll get an intuitive system for phonetic notation.
> Even real IPA isn't intuitive. For phonemic, what's wrong with SAMPA?
When I say "intuitive", I mean that the first impression it gives when
seeing it should not be too far off the actual meaning of the symbols.
Most IPA characters do resemble well-known Roman (or Greek) letters,
and their phonetic value is usually not too far away from the "usual"
value of the letter the IPA symbol resembles most. Of course you
still need to know the IPA chart in order to interpret it correctly,
but you get a meaningful impression from the first look. You
instantly get some vague idea of how the sample might sound like, and
you won't be too far off. This is not the case with X-SAMPA!
Many, many X-SAMPA symbols don't look like letters at all but like
random dingbats, and the first impression is that of line noise.
To take just a single example: the IPA barred-u looks similar to an u,
and one might guess that it represents some kind of u-like vowel
even without checking the IPA chart, and you are right. It is a close
central rounded vowel, halfways between /u/ and /y/. But the X-SAMPA
character [}] tells the reader *nothing* on the intuitive level. You
instinctively start looking for the matching opening bracket. Either
you find it (because there is a near-open front unrounded vowel
somewhere to the left of it), in which case it won't help, or not.
Why not choose something that looks more u-like, such as [*u]?
X-SAMPA would spell my first name phonemically as /j2rg/, phonetically
as [j9`g_0]; you might guess that the digits represent some kind of
vowels, but which? Using a system I have concocted by myself, I would
write /j"org/ and [j"O^rg_h], and you could guess that the vowel is
somewhat "o-like", or, of you know that quite a number of languages
use an "o" with two dots on it for a mid front rounded vowel, that
the same is meant here, and you will be right.
I call this misleading and distracting. This lack of intuituiveness
is fine for a conversion program that has no sense of intuition
anyway, but it makes it much harder to read for humans.
Of course, some other ASCII-IPA schemes have the same kind of
problems. The Kirschenbaum choice of [K] for a voiceless lateral
fricative (or something like that) is misleading, no doubt; intuition
would tell you that it must be some kind of dorsal obstruent.
And that's just one example.
> > > I'm talking about people who find that X-SAMPA is exactly what they
> > > need, except that it would look so much nicer if we just made this
> > > teeny little change, the use of which will of course be intuitively
> > > obvious to everyone seeing the transcription.
> >
> > If people say they use X-SAMPA, they should strictly follow the
> > X-SAMPA standard. But if they don't say they use X-SAMPA, they can do
> > what they want! (As long whoever the message is meant for, still
> > understands what the author means.)
>
> > And if someone thinks that he doesn't like the way X-SAMPA does this
> > or that, he is free to change it, only that it is no longer X-SAMPA
> > then. But that doesn't hurt.
>
> Well, my argument is that it does hurt, by confusing readers and
> making it harder to collect data in consistent notations.
I admit that it would be easier if everybody used the same encoding,
but must we pick one of the most ugly and least human-readable schemes
in circulation? Every system has its pros and cons; new systems will
be designed in the future. Well, it is quite the same problem as with
IALs.
> > While X-SAMPA is *a* standard, it is (fortunately, IMHO) not *the*
> > standard on CONLANG.
>
> I have to admit that the proportion of people using SAMPA or X-SAMPA
> is now large enough that I've stopped trying to remember any other
> system. With the amount of traffic on the list now, any data given in
> Kirschenbaum or whatever provokes a why-bother reaction.
That's not a good reaction. It is pretty much like, "I can design my
pages exclusively for the MicroSSoft Internet Exploder and put Word
documents on the web with impunity. Of course the Linux/Unix/BSD
users won't be able to read them, but they are less than 10%; most
people use Windoze anyway, so why bother". Of course, there is a
difference because the X-SAMPA people don't want to conquer the
world (at least not as far as I know). They just suggested a solution
for a particular problem. And I don't reject X-SAMPA because I find
it (or its creators) "evil"; I am just of the opinion that it has a
number of shortcomings that could have been avoided. But people whose
posts are ignored (or even flamed against) merely because they use a
different encoding scheme won't be happy with that, no matter how nice
the X-SAMPA designers are.
Actually, I haven't really seen much X-SAMPA yet on CONLANG, unless you
count everything as X-SAMPA where only those symbols occur that are
the same in all ASCII-IPA schemes (which are quite many, including the
most commonly used ones). Sure, "/a elbereT gilTo:niel/" *is* X-SAMPA,
but it is also Kirschenbaum, KPA, or whatever, and you can't tell
which system the author had in mind unless symbols unique to one
system occur.
And it also seems to me that those who find that certain X-SAMPA
symbols are poorly chosen ("{" and "}" are on the top of the list,
no doubt) by far outnumber those who actually, and consistently,
use X-SAMPA. If one was to conduct a vote in CONLANG whether X-SAMPA
should be adopted as "official CONLANG standard", I am sure the NO
votes would outnumber the YES votes.
Of course, CONLANG doesn't stand alone in the world, there are many,
many other people in need of encoding IPA in ASCII; but even in this
greater world, X-SAMPA is far from being the universally established
standard. The plurality of ASCII-IPA encoding schemes is a fact, and
will probably remain so as long as we need such encodings.
I'd guess that people won't settle on X-SAMPA (or anything
else) as a truly universal standard (whether de facto or de jure)
within the next 10 or 15 years - and by then, everybody will have
Unicode and we can all happily use actual IPA and throw all those
ASCII-IPA schemes away.
> For data in X-SAMPA-with-random-replacements, that reaction is only
> slightly delayed.
That's one reason why I don't like "X-SAMPA-with-random-replacements".
The other is that there is IMHO so much to change in X-SAMPA to make
it suitable for human-readable e-mail use that it is a better idea to
come up with an entirely new system.
And isn't the fact that there are so many "dialects" of X-SAMPA a
strong indicator that it *doesn't* satisfy the customers?
Jörg.