Re: Shareable/centralizable dictionary server software? (WAS: Size of your dictionary)
|From:||Sai Emrys <saizai@...>|
|Date:||Saturday, April 4, 2009, 3:32|
On Fri, Apr 3, 2009 at 8:19 PM, Alex Fink <000024@...> wrote:
>>* every entry belongs to
>>- root(s) (e.g. _kitaab_ -> *ktb; same can be used for etymologies)
> But there should be a more flexible etymology feature (one that lets me
> specify an exact preform, or irregular developments, or ...) too. Even if
> just a flat text field, though that's unintelligent.
I think you misunderstood.
My proposal is that every entry can be derived from (i.e. belong to)
multiple other entries (typically just 1, but hey).
So for example if you have an entry for qux (in modern fooish), you
could say that it derives from another entry, qukh (in middle fooish),
with a sibling kukh (which means something else in modern fooish),
So words are all in a tree (or graph, if you really want) structure of
derivation. You just add entries for your previous forms and it works
by the magic of RDBMS.
>>- language(s) (e.g. old fooish)
> "Dialect", sure; "diachnoric stage", I suppose (for etyms to refer to);
> "language" broadly...?
I'm making no linguistic assertions here, so hush on the terminology. :-P
Just saying each word belongs to some word-collection (ie a language
of some particular dialect at some particular diachronic stage yada
>>... and has:
>>- an xsampa, UTF8 romanization, and UTF8 custom font form
> UTF8? Who's gonna have their conscript in Unicode?
My presumption (perhaps inaccurate?) is that any custom font will use
Unicode underlyingly - i.e. you type some string of Unicode and it
outputs as something fancy in that font.
So this would be just <font face="yourspecialfont">foobár</font>
really. Simplest way I could think of supporting it.
> Don't forget
> - morphological data ("n-stem", "third conjugation", "ablauts to form the
> past stem", "irregular plural /tsu:xnu/", what have you).
> In fact it would be nice if the tool were integrable with some morphological
> tools, flexible enough to give you the forms of the stored words.
Right. That's a more complex thing, though, and I'd have to describe
it at a meta level (or just punt and have it be just a random
> Plus SIL Shoebox (or Toolbox these days?), which already has many useful
> features in its format, and is used by lots of field linguists and stuff.
> In fact one should probably be familiar with Shoebox's features before
> embarking on this (I'm not, not beyond the barest), but e.g. it was made to
> be able to track instances of your words in your corpus much like the thing
> you suggest for the example sentences.
Right. I'mma research it. Will see if I can get the source code & a format spec.