Conlang: Re: Shareable/centralizable dictionary server software? (WAS: Size of your dictionary) (Sai Emrys, Apr 4 '09, 3:55)

Re: Shareable/centralizable dictionary server software? (WAS: Size of your dictionary)

From:	Sai Emrys <saizai@...>
Date:	Saturday, April 4, 2009, 3:55

From:

Sai Emrys <saizai@...>

Date:

Saturday, April 4, 2009, 3:55

Some other ideas (modeling it so as to be able to replace Wiktionary): * somehow parse Wiktionary pages? (seems like a very hard task given that style isn't uniform...) * give multi-user edit ability with particular permissions (e.g. fully editable; moderated edits; canon vs noncanon; single-source only) * give sources of contributions (either a user or a standardized cite) * add versioning at some point * automagically find homonyms (by phonological distance metric of the xsampa / ipa) and syn/antonyms (by some sort of cooccurrence / similarity metric on definitions plus explicitly defined relations) * definitions have properties (eg vulgar, archaic, obsolete, jargon of a particular category, dialectical of a particular category) * definitions (or words? not sure what level is accurater) can have cross-language links for translations * some support for compound words (this gets at the whole 'how do you support morphology' question though...) * automagically scan a corpus text and link all the forms of the word present? (seems Hard) (... obviously most of these are a version 4 spec...) On Fri, Apr 3, 2009 at 8:47 PM, Alex Fink <000024@...> wrote:

> This is a good start. But what I was getting at is that there's more to an > etymology than a source word. For one I may want to track the precise > proto-form, which is more information than just a single lemma in the > lexicon: maybe my word came from a _particular_ derivational or inflectional > form of a proto-wordd, maybe it came from a coalescence of two proto-words, > etc.. For two I may want to remark on particularities of the development > itself, like irregular sound changes.

Well, one can always have a random text field, but that seems like a kludge. Supporting morphology is hard, but if we did that, then there'd be no problem marking some word as being derived from a particular moprhological form of a particular word (rather than merely the word generically).

> Two is reasonably addressed by just adjoining another text field. One > depends on how your model treats forms as opposed to the stems they come > from -- how does it? (This bears on the irregular forms thing too.)

I don't know. Suggest something.

>>So this would be just <font face="yourspecialfont">foobĂĄr</font> >>really. Simplest way I could think of supporting it. > > Oh fine. That is of course Wrong from the Unicode purists' perspective, though!

Give me a better method. :-P - Sai