Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Using word generators (was Re: Semitic root word list?)

From:H. S. Teoh <hsteoh@...>
Date:Tuesday, January 9, 2007, 21:28
On Tue, Jan 09, 2007 at 01:03:16PM -0800, David J. Peterson wrote:
[...]
> H. S. Teoh: > << > I agree. English itself shows this: J, Z, and X occur very rarely > compared to, say, E. This does not seem to be a problem in practice. > :-) > >> > > Right, but this is a slightly different matter. Would you contend > that the sounds /dZ/, /z/ and /ks/ rarely occur in English? /z/, at > the very least, occurs in...what, almost 50% of plural nouns?
I think /ks/ is still relatively rare, although you're right, there is a discrepancy between the orthography and the actual phonemes. Mea culpa.
> The problem is that my alphabet is pretty much phonemic. Unless I use > the letter for the bilabial click, there is no bilabial click.
True that.
> Jörg wrote: > << > Second, you can easily avoid and correct imbalances by looking at what > you have already invented, and use the underrepresented phonemes more > frequently and the overrepresented ones less frequently as you > progress. > >> > > I don't know about the "easily" part... I wonder: is there a simple > way to calculate letter frequency in one's vocabulary? I bet > there probably is, but not for folks like me that use a word processing > document for a dictionary... I'd switch to a spreadsheet, but it's > just so ugly... And too practical! ;)
[...] Text files forever! ;-) I keep the Ebisédian lexicon as a set of LaTeX source files, and have written a utility for parsing and building a lookup table out of it. I believe I've actually written a frequency analysis function for it, too. :-) For Tatari Faran, since the orthography is not terribly ugly, I keep it as a formatted plaintext file, with a Perl script for doing lexicon searches and various such things. It already has a way to output a list of words (bare word, no IPA, no definition, etc.), which should be easy to filter through another Perl script that does frequency analysis on it. The problem with using a full-fledged word-processor format like .doc is that (1) it's binary, and therefore very difficult to write scripts to process things automatically, (2) it contains formatting codes in addition to text, making extraction of words rather tedious, and (3) in the case of MS Word, the format is proprietary and you have to reverse-engineer it in order to get any information out of it. The only recourse is to write a VB script or some such, that does what needs to be done within Word itself. I suspect it's still doable, but I much rather prefer Perl's ready-to-use arsenal of text-parsing features than to implement a lexical analyser in VB. :-P Unfortunately, in either case, programming expertise seems to be a requirement, unless you use a common-employed format like Shoebox for which others have written such utilities. (I seem to remember Gary proposing some sort of conlanging system recently---that would work, too. But then porting everything over is always a tedious job.) T -- Let's not fight disease by killing the patient. -- Sean 'Shaleh' Perry