The Language Code, take 2 (or 3)

From:	Dirk Elzinga <dirk_elzinga@...>
Date:	Tuesday, June 10, 2003, 15:16
|< < Post > >| << List/Tree >> June 2003 Index
Hey.

Since discussion on the Language Code (such as it was) has died down, I
now take this opportunity to present the revised version, incorporating
the comments I received. Feel free to suggest additions or other
changes.

-=-=-=-=-=-=-=-=-=-

The Language Code


Introduction.

The Language Code began as a tongue-in-cheek imitation of the Geek Code
(http://www.geekcode.com), but it can be used as a tool to create a
typological "thumbnail sketch" of any given language, natural or
constructed, without locking that language into categories which might
be misleading; for example, all languages exhibit some degree of
agglutination, but not all languages are "agglutinating." Using scalar
values for variables is meant to suggest the continuous nature of
linguistic categories.

There are six main categories, each of which are subdivided into
varying numbers of subcategories: i) Type of language, ii) Phonology,
iii) Writing system, iv) Morphology, v) Syntax, and vi) Lexicon. Any
idiosyncracies in the choice and values given for the subcategories
within these major categories is my responsibility and does not
necessarily reflect conventional wisdom with respect to language
description. However, I do think it's a useful list.


How to use the Language Code.

First, find a language you're interested in. Then go through the Code
and determine the values for each of the categories. Some things to
keep in mind:

*  If you're not sure what the value should be, you can always use a
question mark following the category label.

*  If a category doesn't apply to your language, place an asterisk "*"
following the category label. For example, if the language you're
interested in does not have a writing system (other than the phonetic
transcription), you would encode this fact as "W*".

*  Sometimes the value of a category falls within a range; you can
indicate the range by using parens. For example, Ma+(++) could indicate
a language which has some degree of agglutination, but which perhaps
varies by part of speech so that nouns show little agglutination (a+)
but verbs show more (a+++). Or it might indicate a language which is
still underdevelopment, with the enclosed values showing the planned
range.

*  If your language has a value for a particular category, but that
category isn't listed in the Code, you may always use the value 'o' for
"other". (You may also petition me to have your value included in
future versions of the Language Code.)

When you are done, you will have a string of categories and values
which will provide a typological profile of your language. By itself
this information may not be particularly informative; the true value of
the Code will come in the head-to-head comparison with other languages
using the Code and its categories as a common vocabulary of comparison.


T       type
        f       fictional
        l       logical
        x       auxiliary
        p       personal
        n       natural
        o       other

P       phonology
        t       tonal
                c       contour tones
                        r       register
                        #       number of tones
                l       level tones
                        !       downstep/downdrift
                        #       number of tones
        p       phonemes
                +/-     allophony
                #       consonant phonemes
                #       vowel phonemes
        s       syllable template {c,v}

W       writing system
        n       natural
        c       constructed
        t       type of script
                f       featural (Hangul, Tengwar)
                c       abjad ("Consonantal")
                d       abugida ("Devanagari")
                a       alphabet
                s       syllabic
                l       logographic
                o       other
        r       +/-     regularity/irregularity

M       morphology
        a       agglutinating (+/-)
        i       isolating (+/-)
        f       inflecting (+/-)
        h       head-marking (+/-)
        d       dependent-marking (+/-)
        t#      number of distinct tenses
        a#      number of distinct aspects
        m#      number of distinct moods
        t/a#    number of distinct tense/aspect combinations (where a
                meaningful distinction between tense and aspect cannot be
                made) (also t/m, a/m, etc)
        c#      number of distinct cases
        g#      number of genders or noun classes
        n#      number of number distinctions

S       syntax
        b       basic word order {v,s,o} (may substitute dots when the terms
                s = 'subject' and o = 'object' are not meaningful or when word
                order is not fixed)
        arg     argument alignment
                n       nominative/accusative
                e       ergative/absolutive
                a       active/stative
                h       hierarchical
                t       topic/focus
                s       split/mixed system
                r       semantic role
                o       other

L       lexicon
        c       compounding/incorporation (+/-)
        d       derivation (+/-)
        #       number of words so far

English: Tn Pt*p++24,9(c)v(c) Wntar-- Mi++f+dt2a3c2n2 Sbsvoargn
Lc++d+1000000+
Shoshoni: Tn Pt*p+++12,6(c)v(v/c) Wntar++++ Ma++f+h++d+t/a13c3n3
Sbsovargn Lc+++d++25000?
Tepa: Tf Pt*p+++11,4s(c)v(v/c) W* Mf++h+++t*a2c*n4 Sbv..argh Lcd+600
Shemspreg: Tp Pt*p+22,5s(c)v(c) Wnar+++ Mf+d+++t/a3c3n2 Sbsvoargn
Lc+d++1000

--
Dirk Elzinga
Dirk_Elzinga@byu.edu

"I believe that phonology is superior to music. It is more variable and
its pecuniary possibilities are far greater." - Erik Satie
|< < Post > >| << List/Tree >> June 2003 Index
Replies

Mark J. Reed <markjreed@...>
And Rosta <a.rosta@...>
Jan van Steenbergen <ijzeren_jan@...>