CPA - An ASCII-based phonetic alphabet

From:	Jörg Rhiemeier <joerg.rhiemeier@...>
Date:	Monday, November 12, 2001, 22:24
|< < Post > >| << List/Tree >> November 2001 Index
Hi friends,

my departure being delayed by one day, I have found the time to post
the new, revised version of the ASCII-IPA scheme I posted a few days
ago.

It now has a name - CPA -, and includes the changes suggested on the
list: adding symbols for palatal and velar laterals (which I had
erroneously left out believing they don't exist), and revising the low
vowel symbols in oder to make them more consistent with the rest of the
vowel symbols.

Share and enjoy!


                CPA - An ASCII-based phonetic alphabet
                ======================================
                             Version 1.1
                       (C) 2001 Jörg Rhiemeier

CPA is an ASCII representation of the International Phonetic Alphabet
(IPA).  I have called it "CPA", for "Claude's Phonetic Alphabet"
("Claude" is a nickname I used on IRC and MUDs before I lost interest
in those distractions, but still subsists as login name on my home
Linux machine); though the "C" could just as well stand for
"Conlang", in case CPA should become established de-facto standard on
CONLANG.

The purpose of CPA is: to provide a system of symbols that can
do anything IPA can do, but exclusively uses 7-bit ASCII characters,
thus being perfectly safe for usage in e-mail or similar media;
that can be converted to and from IPA (encoded in Unicode or something
else) automatically; and that is intuitive and easy to read even
without help from a conversion program.

The question is, of course, why yet another system of this kind when
we already have several others, such as Kirschenbaum and X-SAMPA?
Well, the main reason is that I find most existing schemes unsatisfying.

A good ASCII-IPA encoding ought to be intuitive, i.e. one should be
able to get a fairly good impression of how something is pronounced
from a quick glance at the transcription, which is the case if each
symbol used is similar to a letter commonly used for such sounds or
similar sounds.  IPA does a fairly good job at that; some ASCII
encodings (most notably X-SAMPA) do not.  Why, for example, should [K]
be a lateral fricative, or [{] a lowish front unrounded vowel
(to quote just two blatant examples from X-SAMPA)?  A scheme like
X-SAMPA might do a good job when it comes to automatical conversion to
and from IPA encoded in Unicode or whatever, but why can't it be
intuitive to humans?  CPA is, in my opinion at least, easier to read
for humans, and it should be automatically convertible as well
- I haven't proved the latter yet by writing a working conversion
program, but I think I have come up with an unambiguous scheme.

Another criterion is that the transcription should not be cluttered
with too many non-letter symbols which make the transcription hard to
read.  Kirschenbaum uses many clumsy 3-letter markers in angle
brackets; for example, the far-from-uncommon alveolar trill is
represented by [r<trl>], the voiceless lateral fricative by [s<lat>].
Kirschenbaum and SAMPA also use many non-letter (and non-letter-like;
digits and characters such as [@ $ &] at least have some letter-like
graphical quality) characters (most of which are also
counter-intuitive); the alveolar flap is [*] in Kirschenbaum, for
example.  A few non-letter symbols are unavoidable, because 52 letters
(26 small and 26 capital) simply aren't enough.  In CPA, all segment
symbols are either letters, letter-like characters, or letters or
letter-like characters with one or two modifier characters ([* " . ` !])
attached.  The use of [*] as a modifier character does not create
ambiguities with the use of asterisks to mark reconstructed or
ungrammatical forms: when asterisks are used with CPA in the latter
sense, they are placed _outside_ the bracketing /.../ or [...], as in
the following example:

        feet < OE /fe:t/ < */f"o:t/ < */f"o:ti/ < */fo:ti/

The modifier character always goes _inside_ /.../ or [...], as in the
following example showing the working of a (fictional)
vowel-centralizing rule:

        /*igul/ < */igul/

This means that in the modern language, the word is /*igul/, with a
central vowel, while the reconstructed proto-form is /igul/ with a
front vowel.  (If the form with the central vowel was reconstructed as
well, it would be */*igul/.)

CPA also features three "wild card" segment symbols [# $ %] and six
"wild card" diacritics [^# _# ^$ _$ ^% _%] These symbols have no
defined values, but can be used for whatever one needs in a specific
language, e.g. as abbreviations for segments that are frequent in the
language discussed, but could otherwise only be expressed by a complex
combination of diacritics, or for weird sounds that might occur in
non-human languages.  Any of the symbols [# $ %] could alternatively
be used as a modifier character to create more segment symbols;
a symbol used as a modifier, however, can no longer be used as
a segment symbol in order to preserve unambiguity.  (But that's indeed
a quite favourable trade-off: you get 50-something new symbols for the
price of one.)  Of course, whenever you use wild cards, make sure you
define them first, and use them consistently.  (And you sacrifice
automatic conversion by using wild cards, of course, unless you come
up with a customized converter.)

Of the ASCII-IPA schemes, Herman Miller's KPA comes closest to what
I consider a good ASCII-IPA encoding, and CPA owes a lot to KPA.
In fact, I took over the KPA consonant symbols with very few changes:
I placed the modifier characters (characters used to create more
symbols) [" *] before the letters instead of after them; e.g. KPA [N"]
corresponds to CPA ["N].  I also switched the alveolar flap and trill
symbols, because I feel the flap is the simpler (and more frequent)
sound, and because [r.] is a retroflex flap rather than a trill in KPA
(and CPA). The suprasegmentals and diacritics also mostly follow KPA
with a few changes.

The vowel symbols are not based on KPA or any other pre-existent
encoding scheme, but designed according to rather simple basic rules:
the symbols [i e E a 6 O o u] for the primary cardinal vowels; ["]
reverses fronting (e.g. ["o] is close-mid front rounded and [o] is
close-mid back rounded) and [*] centralizes.  This is simple and
consistent, and boroughs from similar conventions used elsewhere
(e.g. in Finno-Ugristics).  To this, I added the symbols [y A & @]
simply because just about everyone uses them to such an extent that
there is hardly any reason not to follow the de-facto convention.

Well, enough of the prolegomena, here's the stuff!


                CPA - An ASCII-based phonetic alphabet
                ======================================
                             Version 1.1
                       (C) 2001 Jörg Rhiemeier
                       Specification of symbols

Consonants
----------

       blb.  lbd.  dnt.  alv.  pav.  rfl.  plt.  vel.  uvl.  phr.  glt.

stop   p  b              t  d        t. d. c  J  k  g  q  Q        ?
nasal     m     M           n           n.   "n     N    "N
flap                        r           r.
l.flap                     *l          *l.
trill    "B                "r          "r.               "R
fric   P  B  f  v  T  D  s  z  S  Z  s. z. C "j  x  G  X  R  H  9  h "h
l.fric                  "l "Z       "l."Z.
appr           "v          *r          *r.    j    "w
l.appr                      l           l.    L    "L

click  p!          T!    t!                c!
l.click                  l!
impl      b`                d`                J`    g`    Q`
ejec   p`                t`                c`    k`    q`

Symbols for clicks, implosives and ejectives at other points of
articulation can be easily created analogously.

 W  voiceless labial-velar fricative
 w  voiced labial-velar approximant
"y  voiced labial-palatal approximant
"H  voiceless epiglottal fricative
"9  voiced epiglottal fricative
"?  epiglottal plosive
"c  voiceless alveo-palatal fricative
"z  voiced      "      "        "
"S  simultaneous S and x

{kp} double articulation
{ts} affricate

Vowels
------
         Front     Central    Back

Close    i "u       *i *u    "i  u
              I  Y          U
Close-mid   e "o     *e *o   "e  o
                        @
Open-mid       E "O   *E *O  "E  O
                &       *a
Open              a "6       "a  6

Alternative symbols:  [y]=["u]; [A]=["a].

Suprasegmentals
---------------

'      primary stress
,      secondary stress
:      long
;      half-long
^(     extra-short
-      syllable break
|      minor (foot) group
||     major (intonation) group
=      linking (absence of a break)
5 ^"   extra high
4 ^'   high
3 ^-   mid
2 ^`   low
1 ^=   extra low
>      downstep<      upstep
24 ^v  rising (optionally, [/])
42 ^^  falling (optionally, [\])
<<     global rise
>>     global fall
Diacritics
----------

_h  voiceless         _:  breathy voiced     _d  dental
_v  voiced            _~  creaky voiced      _a  apical
^h  aspirated         _m  linguolabial       _l  laminal
_)  more rounded      ^w  labialized         ~   nasalized
_(  less rounded      ^j  palatalized        ^n  nasal release
_+  advanced          ^G  velarized          ^l  lateral release
_-  retracted         ^9  pharyngealized     ^7  no audible release
^:  centralized       ^~  velarized or pharyngealized
^x  mid-centralized   _^  raised
_|  syllabic          _V  lowered
_`  non-syllabic      _<  advanced tongue root
^r  rhoticity         _>  retracted tongue root

Additional symbols not included in IPA
--------------------------------------
(mainly from Ladefoged & Maddieson,
_The Sounds of the World's Languages_)

0      zero string of segments
_M     labiodental
_.     apical retroflex (if necessary to distinguish
                         from sub-apical retroflex)
^m ^N  prenasalized (^mb ^nd ^"nJ ^Ng, etc.)
^^     closed post-alveolar (hissing-hushing) fricatives
^"h    alternative representation of breathy voice
^x ^G  affricate click release
^?     glottal click release
_=     strident vowels (double tilde below)
_w     simple labialization (not velarized)
7      velarized l (X-SAMPA [5])
*I *U  central vowels slightly lower than [*i] and [*u]
       (X-SAMPA [I\], [U\])
*i     U+027F reversed r with fishhook
       (resembles iota turned 180 degrees)
i.     U+0285 squat reversed esh
       (represents a retroflex vowel)
^s     sibilant fricative; no IPA equivalent
^o     whistled; no IPA equivalent

Wild card symbols
-----------------
These symbols can be used for whatever one needs a symbol for;
they must be defined first, otherwise they are meaningless.

# $ %  arbitrary phonetic symbols, e.g.,
       as shorthand for complex but frequent segments,
       or human-impossible sounds in non-human languages
       (may also be used as modifier characters instead
        to create a larger number of new symbols)
^# _#  arbitrary diacritics
^$ _$  dito
^% _%  dito

...brought to you by the Weeping Elf and the letter "ö"
|< < Post > >| << List/Tree >> November 2001 Index
Replies

Tristan Alexander McLeay <anstouh@...>
Christophe Grandsire <christophe.grandsire@...>