Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Iterative conlang design with corpus analysis, Or, Build one to throw away

From:David J. Peterson <dedalvs@...>
Date:Tuesday, April 18, 2006, 22:59
<<
snip Jim's explanation
 >>

Sounds like a fantastic project!  Will you be webifying it step-by-
step?

<<
1. How to count length of words?  By phonemes, or syllables,
or some weighted count where vowels count more than
continuant consonants which count more than plosives?
 >>

Not by phonemes, but not necessarily by conventional weight
measurements, I'd recommend.  It, of course, depends a lot on
your phonology, though.  I believe I read that this is your word
shape:

C(C)V(C)

Is this right?  If this is so (and you're counting both words and
phrases), then I'd assign a value to each word, and combine the
value of the word's in a given phrase (where n=1 to infinity), and
then that'd be your measure.  For values, I'd recommend the
following:

Onset = 0
Main V = 1
Coda C = 1 (modulo your preference for stress rules--see below)
Onset Cluster C = 0.5 (or less, but not 0)

I don't buy that "top" and "stop" are the same, lengthwise, and
I've heard rumors about papers somewhere out there which
argue the same.  While onset clusters clearly don't play the role
that coda consonants and clusters do when it comes to stress,
they do play some role when it comes to speed of pronunciation.

Consonants, however, don't all have to have the same weight,
but this is language-specific.  So, for example, if I were doing
Spanish, the codas /-n/ and /-s/ wouldn't have the same weight
as any other coda consonant.  They would have some (maybe
0.5), but wouldn't be weighted like /-r/, /-D/, /-l/, etc.  When
it comes to stress, /-n/ and /-s/ are almost treated like they're
not there, and when speaking Spanish, they can *feel* like
they're not there when they appear in words like "tienes", "tienen",
"carros", etc.

As you've described your project, I think it'll just take practice
speaking the phonology that you come up with.  It should become
clear after awhile if there's a class of consonants that are less
weighty word-/syllable-finally than the rest.  In that case, maybe
the word /stan/ would be less weighty than /stal/, which would
give you more flexibility when you get to the end (so maybe the
100th most common word can't get a CV root, but it can get a
CVW root, where W = weak coda consonant).

I'll venture a guess at another one:

<<
3. What should be the criterion for a phrase occuring "often enough"
in the corpus to deserve its own root word?
 >>

I'd suggest that you'd have to see it to know.  : \  I don't think you
can come up with a metric beforehand.  Sounds like it's going to be
a somewhat time-consuming task--especially if you're corpus is
going to have a lot of talk about "hanging chads" or "jerrymandering".

Anyway, though, sounds like a cool project!  Sounds like something
I'd like to try too, in fact, if I didn't happen to lack any programming
skill or knowledge how to use a corpus...

-David
*******************************************************************
"sunly eleSkarez ygralleryf ydZZixelje je ox2mejze."
"No eternal reward will forgive us now for wasting the dawn."

-Jim Morrison

http://dedalvs.free.fr/

Reply

Jim Henry <jimhenry1973@...>