Re: Phone frequencies
From: | Alex Fink <000024@...> |
Date: | Sunday, September 7, 2008, 2:17 |
On Sat, 6 Sep 2008 17:46:14 -0400, Logan Kearsley <chronosurfer@...>
wrote:
>I used to have an IPA table that included the frequency of each phone
>among world languages- which phones occur in 90% of all languages,
>which phones occur in 80% of languages, which phones occur in only 5%
>of languages, etc. But I seem to have lost it, and I can't find
>anything like that on line. Anybody know where I could get a table or
>a list with frequencies for different phones among world languages?
Wouldn't you know it, I was _just_ looking for the very same thing. UPSID
(the UCLA Phonological Segment Inventory Database) does nearly exactly this,
and there's an interface to it at
http://web.phonetik.uni-frankfurt.de/upsid.html .
Use "find certain sounds and languages that have them", option #5; it gives
you a table with frequencies of each phone in the phonologies in its
database below the output. Not sorted, but you can do that.
Below I excerpt from an offlist message on the Glossotechnia discussion
about this.
[excerpt begin]
For consonants it's got the irritating feature that dentals and
alveolars and unspecified dental/alveolars are all counted separately,
though. I've corrected for that by taking the unspecified counts and
multiplying those by 14/5, and discarding the other two sorts -- this
is indefensibly hacky, when I could've done the summation, but it was
quick. That gives the following top of the frequency list (warning,
monospace table ahead):
n .9935 g .5610 k_h .2284 dz) .1240
t .9436 N .5255 p_h .2239 G .1220
m .9424 ? .4789 r* .2234 c .1197
k .8936 tS) .4169 v .2106 B .1197
l .8445 S .4146 x .2084 q .1153
j .8381 f .3991 4 .1613 tS)_h .1131
s .8381 r .3167 ts)_h .1551 b_< .1086
p .8315 J .3126 t_> .1490 mb) .1064
w .7361 t_h .3041 K .1490 ts)_> .1056
b .6364 ts) .2794 k_> .1397 nd) .1056
h .6186 z .2669 Z .1353
d .5650 dZ) .2506 k_w .1330
and no other sounds in more than one language in ten. r* was glossed
in the list as "voiced dental/alveolar r-sound", whatever we make of
that.
For vowels the parallel irritation is that e.g. /e/ and /E/ and
indifferent /e/~/E/ are counted separately; I've corrected (slightly
less undefensibly) by dropping the indifferents and multiplying the
others by 11/7, but special-cased /@/ and left it alone. This gives
i .8714 I .1641 a_": .0754
a_" .8692 U .1463 e: .0732
u .8182 1 .1353 e~ .0627
E .6481 E~ .1219
O .5645 O~ .1116
o .4565 o~ .0941
e .4320 M .0909
a_"~.1840 i: .0887
i~ .1818 & .0865
@ .1685 o: .0836
u~ .1641 u: .0798
and no other sounds in more than one language in sixteen. My /a_"/
was just written /a/ in the UPSID but they called it unambiguously a
low central vowel (this is a hole in the IPA more than anything).
[exceprt end]
Alex
Replies