Theiling Online    Sitemap    Conlang Mailing List HQ   

(LONG) on conlang software (was Re: Lojban program and conlang software ideas)

From:Brook Conner <nellardo@...>
Date:Monday, May 8, 2000, 16:21
Peter Clark wrote:
[... details from freshmeat snipped ....]
> Anywho, if this interests anyone, check out: > > Next order of business: has anyone considered starting up a Open > Source/Free Software (take your pick of terms)
I'd settle for "open source," but putting general tools for conlanging under the GPL would probably be a good thing - these are fundamentally our own ideas we're creating here, and providing someone else the ability to profit off of our own ideas (esp. when we know the commercial possibilities of the languages themselves are somewhat limited) just doesn't sit well with me. But then, I'm something of an anarchist (at least, when I feel optimistic about people :-)
> line of conlang software > and tools? How many programmers do we have on the list? I really wish that
I suspect there's a fair amount - for me, at least, conlangs and proglangs are simply different points in a space.
> a.) I had more time
Hear hear! :-)
> and b.) knew a decent programming language (I am still > teaching myself C in my officially non-existant free time),
Ugh - I can't think of a PL still in widespread use that could possibly be much worse for conlang work than C. If you want elegance and power, try Haskell or (if you can stand the parens) Scheme. Both are widely cross-platform, with free compilers and interpreters for *everything* - win, mac, unix, even palmtops (e.g., PocketScheme). Haskell is especially beautiful as a proglang, whose facility with lists and strings is really quite nice and *readable* (if you name your functions something reasonable). The list comprehensions in Haskell are particularly nice. If you really *must* have something with a complex lexical structure that explores the farthest reaches of the printable parts of ASCII, then at least try Perl (which, IMNSHO, is a blight upon the world of proglang design, which doesn't change the fact that it is a useful tool). Perl's extensive support for string processing is well-suited to conlanging. Someone else recommended Python, which is also a good choice, as is Java (though it has some of the lexical complexity of the whole ALGOL family (C, C++, Perl, and various "practical" scripting languages). [...]
> o Random word generator - I have found several on the web, but aside from > LangMake, these are primitive at best. Of course, I do my word generation
> transformation feature), but why not take a good thing and make it better?
Just to plug my favorite PL :-), here's something simple to generate lojban gismu: gismu = ccvcvGismu , cvccvGismu ccvcvGismu = [ [ c1 , c2 , v1 , c3 , v2 ] | c1 <- lojbanConsonants, c2 <- lojbanConsonants, c3 <- lojbanConsonants, v1 <-lojbanVowels, v2<-lojbanVowels ] cvccvGismu = [ [ c1 , v1 , c2 , c3 , v2 ] | c1 <- lojbanConsonants, c2 <- lojbanConsonants, c3 <- lojbanConsonants, v1 <-lojbanVowels, v2<-lojbanVowels ] "gismu" is a list of ccvcv gismu, followed by cvccv gismu. Each of the sublists of gismu are defined as "the list of all lists of five letters such that c1,c2, and c3 are lojban consonants and v1 and v2 are lojban vowels", with the letters in appropriate orders. One of the nice things about Haskell is what I didn't say - lojbanVowels would be a list of characters, e.g., "aeiou", but if it were a list of something else, the code wouldn't matter. You can do the same kind of stuff with more abstract stuff like phonemes, tones, what have you. Want something more general? Here's one that simply needs a function that returns true if the word is part of the language and false if it isn't: wordlist characters isAWord = [a | a <- perms characters, isAWord a ] For lojban, isAWord would be something like this: isAWord [a, b, c, d, e] = consonant a && ((consonant b && vowel c) || (vowel b && consonant c)) && consonant d && vowel e "perms" is the function that returns a list of all possible permutations of a list. Obviously, this is somewhat brute force, but not bad for a one-liner :-)
> o Dictionary program - something where the user could type in the word and > the translation, and the program would insert generate a Conlang<->Natlang > dictionary. It would definitely have to handle multiple meanings; grabbing > an example from Russian, if I type in "jazyk" for the Russian word and > "tongue" and "language" for the English definitions, I should be able to > find "jazyk" under both "tongue" and "language" in the English section.
This part is relatively simple, if the data is the right format.... Let's say that dictionary entries are a pair of lists - possible meanings in lang a on one side, possible meanings in lang b on the other. A list of such pairs is the raw data for a dictionary. So (["jazyk"], ["tongue", "language"]) for the example above.... langAtoLangB :: [ ([A],[B]) ] -> [ (A, [B]) ] langAtoLangB [] = [] -- empty dictionary langAtoLangB ( a , b ) :: rest = [ (x , b) | x<-a ] And similarly for the other direction. Sorting is a standard library function, though we need a little function for ordering: order (a, _) (b, _) = compare x y AtoBDictionary = sortBy order langAtoLangB data
> It > should also work the other way as well; if I type in "probovat'" and later > "starat'sja", I should find both under "try." It should also be able to > indicate special forms, like "djen'gi" becoming "djenjeg" in the genitive > plural.
The simple routine above wouldn't note that "djenjeg" was genitive plural, but you could certainly include it as a possible translation of "money" (djen'gi, unless I've forgotten more Russian than I thought).
> Plus, it should have an Export To HTML feature, for that web page > that I keep meaning to create... :)
Pretty printing is an exercise for the reader :-) More seriously, the problem in a dictionary generator is more one of specifying the data format than anything else - too much specification, and you might as well write the dictionary by hand. Too little, and it isn't so useful. So a more general dictionary generator would include: * many-to-many word mappings * automatic generation of declensions and/or conjugations (which may require tagging words by part of speech, etc, or may require rules for determining such), with provisions for exceptions for irregular forms. * a template-based generation of output (e.g., XML with random style sheets - it just occurred to me that a suitably robust style sheet processor might be able to do the same as the Haskell code above if given the right style sheet).
> o Transformer (I can't think of a better name--it's getting late) - this > would apply regular sound changes across the board.
A series of rules that replace sounds? And of course, the ability to have those rules be conditional on context, e.g., this vowel only changes if preceded by this kind of consonant..... I presume this is for automatic generation of the kinds of evolutionary threads as Tolkien's development of Quenya and Sindarin from "earlier" forms.
> o Grammar generator - This would be incredibly cool if someone could > actually manage to pull it off. The program would run through a list of > different grammar options (nominative/ergative/active/mixed; SVO, SOV, > VSO, etc.; isolating/agglutinating/fusional/polysynthetic; and so on) and > spit out a grammar. Of course, listing all the millions of different > variables would be a nightmare...
Now this one sounds rather interesting, and programatically somewhat more complex, especially if you want it to generate *parsers* from the particular permutations. Hmmm. This is a neat one. I want to chew on it for a while. Anyone want to suggest more variables/options here? Okay, having thought some more (but not enough), it seems that you first want to divide the options up into orthogonal "dimensions" (SVO etc. being one). Each point on any given dimension corresponds to a "mini-parser" - a combinator of some sort. Functional composition of the "mini-parsers" produces a full parser. Getting the types right in this would probably be a real pain, especially since they're so abstract. E.g., what does SVO expect? A simple list of words makes it too hard for SVO to decide whether the sentence parses (it would have to check parts of speech etc.) No, SVO needs to be passed "words" that have already been tagged by part of speech, normalized to have all words in base form, with differences such as affixes, prepositions, et al factored out. Let's see if this makes sense: parseConlang = findWords => breakIntoSVO => normalizeWords => identifyPartOfSpeech => checkWordOrder "findWords" takes a string and returns a list of words and punctuation. - "I kissed the boy." becomes ["I", "kissed", "the", "boy", "."] "breakIntoSVO" groups that list so that all words that are part of the subject are together, the verb are together, etc. ["I", "kissed", "the", "boy", "."] becomes [["I"], ["kissed"], ["the", "boy"], ["."]] normalizeWords turns modified forms into base forms - [["I"], ["kissed"], ["the", "boy"], ["."]] becomes [["I"], ["kiss", "-ed"], ["the", "boy"], ["."]] identifyPartOfSpeech labels normalized words by part of speech: [["I"], ["kiss", "-ed"], ["the", "boy"], ["."]] becomes [[pronoun, "I"], [verb, "kiss", "-ed"], [noun, [article, "the"], [noun, "boy"]], [punctuation, "."]] checkWordOrder then sees if the things are in the right order - e.g., for SVO, noun verb noun punct. And so on......
> o Simulator - Since I am now officially dreaming, imagine a simulator > where the computer takes two or more languages and builds a simulation of > how they would change and interact with each other. How close would the > computer come to Brethenig? What would have happened if Alexander the > Great had conquered Japan and left a significant speakers of Greek (or > Macedonian--is there a difference?) in Kyoto?
If you had the generator mentioned previously, then you'd have the basis for a genetic algorithm for conlangs. The different variables become "genes" in the "genome". Mix and match according to some sort of objective (e.g., the "dominant" language is more likely to be selected). You'd have to do something similar for words, borrowings, and sound changes.
> Ok, well these last two are probably unreasonable, but since the > first three already have a precedent in one form or another, they should > not prove too difficult.
I think the last two are possible - I think no one's bothered to do it before because, well, there just aren't that many people partaking of the Secret Vice.
> I think it would be nice if the core could be in > ANSI C or some other standardized language that is available across the
Uck. It would be a maintenance nightmare to build this kind of stuff in C. Compilers are written in C only because compiler writers often care about speed - when they don't care so much, they don't write them in C. C is like assembly language - only the syntax is more complicated and it is an ANSI standard (a line I first heard from Gregor Kiczales, of CLOS fame).
> whole width of Win/Mac/Un*x platforms. The GUI could then be a separate > program that calls the core functions; that way, instead of having to > write a seperate version for each operating system, only the GUI would > need to be re-written. (Plus, this would allow both a QT/KDE and GTK/GNOME > GUI for Linux--hey, you could write a GUI in Tk...)
Write the GUI using your favorite CGI script equivalent instead. Write it once, let everyone use it.
> Mmm...just think about piping a list of syllable structures and > phonemes into a word generator, which pipes its output to a dictionary > program which randomly assigns meanings to words, then proceeds to pipe > the resulting dictionary to a transformer which creates half a dozen > daughter languages.
Yep - just imagine - every sci-fi novel on the planet could have a different language for the alien race(s) within it :-) Brook