Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Dictionary formats

From:<kam@...>
Date:Sunday, March 31, 2002, 14:24
On Sat, Mar 30, 2002 at 10:55:07PM -0600, Peter Clark wrote:

> If I may hijack this thread (sorry), I would like to make a general appeal > to all the programmers on this list: please, someone, anyone, write a > dictionary-making program. It can even be text-based (heck, I run Linux, the > command line does not frighten the likes of me), just so long as it allows > for multiple semantic ranges that works both ways and automatically handles > word entry and sorting. Or tell me where I can find such a program. Please! > I'm can't imagine how I'm going to handle a dictionary once I get over a > couple hundred words. > :Peter
At a very basic level you can do a fair bit with grep, sort and cut, read their man pages. I'm writing bits of software to produce concordances that are sort of dictionaries, but mainly concerned with how a given word appears in the various texts, the nice thing is that with html you can put in links to the actual lines in the texts. At a later stage I'll probably add an option for producing something more like a normal dictionary. At present words are grouped by "headwords" (ie all the parts of "to see" appear together) but the headwords themselves don't yet appear. The idea is to have a fairly simple database that can if need be be edited in a text editor, and then write the software to automatically produce plain text, html and latex output. I've just been writing code to allow for notes in the text, and need to figure out the best way to add notes on dictionary entries. One advantage at present is that database entries for a particular text or word or grammatical form etc can be extracted with grep. This is all work in progress (and I tend to work in fits and starts) and the code is fairly spaghettoid at this point, for some sample output see http://home.clara.net/carrot/kernmss/tgk (Sorry about the Christian content, the historical Cornish texts I'm working on are about 95% religious, such was the stranglehold the church had on thought and literacy) I'd be interested to know more clearly what it is you need, in particular what sort of input you'll have, text, word lists??? Keith Mylchreest NB written Cornish has a fairly clear idea about what a 'word' is, this is by no means the case for all languages.