Re: Language Recognition
From: | H. S. Teoh <hsteoh@...> |
Date: | Friday, February 9, 2007, 1:27 |
On Thu, Feb 08, 2007 at 10:27:47PM +0100, Henrik Theiling wrote:
[...]
Y'know, this is just begging for a conlang version of that page... :-)
I'll start:
1) Ebisédian:
Roman orthography (ASCII): Uses '3' and '0' as vowel letters,
double-vowels to represent long vowels, e.g., _00_, _ee_, _aa_. Plenty
of apostrophes in words indicating stress. Case-sensitivity like
Klingon: _K_ and _k_ are different consonants.
Roman orthography (LaTeX): uses ø, multiple accents on vowels (macron,
acute), use of tear-drop accent (looks like left open single quote over
the letter), subscript tilde.
Native script (sanokí): many diacritics over and under glyphs, no
spacing between glyphs, lines may break in the middle of the word
(although this would be hard to apply without actually knowing the
word/clause/paragraph-final glyphs---but perhaps by recognizing repeated
sequences of symbols which break at different points).
In both Roman orthographies, _q_, _x_, are not used. _r_ is relatively
common.
Common tell-tale words: _Ke_, _ve_, _ke_, _je_, _re_ (always at end of
clause); _keve_, _t0m0_, _t3m3_, _tumu_, _tama_, _timi_. Common
single-word clauses ending in -i or -ii.
2) Tatari Faran:
Roman orthography: letters c,g,l,q,v,w,x,y,z not used. Use of apostrophe
(') for glottal stop. Uses _ts_ as digraph. Only lowercase Latin letters
are used, even in proper names and at the beginning of a sentence. _d_
only occurs word-initially and _r_ only occurs medially. Frequent
occurrence of _a_.
Native script: written vertically, top-to-bottom, then left-to-right,
with diacritical marks on the left and right of the column. Letter
forms tend to be flat.
Common tell-tale words: _ka_, _kei_, _ko_, _sa_, _sei_, _so_, _na_,
_nei_, _no_ (never at the beginning of the clause); _e_ (never at the
end of a clause).
Here's an idea for a conlang game: everyone submit info for their
conlang like above and we collect it all in a central place somewhere
(maybe FrathWiki?), then each one creates some sample text in their
respective conlang(s) and submit it to a central repository where
everything is shuffled and redistributed. Then each one has to guess
which conlang the text is written in based on the collected identifying
characteristics. For simplicity, maybe the first version of this game
can be restricted to only those conlangs with Latin-like orthography,
maybe including Cyrillic if there are enough of those to make it
challenging. If it works out well, we can include exotic writing systems
as well the next time round (although those will tend to be so
distinctive that you'd be able to tell immediately).
T
--
The best compiler is between your ears. -- Michael Abrash
Reply