Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: MNCH (was: magic natlang corpus harvesting)

From:Danny Wier <dawiertx@...>
Date:Friday, May 28, 2004, 6:54
From: "Emily Zilch" <emily0@...>

> { 20040527,0304 | Danny Wier } > > "I got 2.78 million hits for Arabic /la:/ 'no' (a ligature) with pretty > high precision. For Hindi, there are 40,700 pages with /hai/ 'he/she/it > is', but there may be some other Devanagari-script languages involved." > > The ligature LA+ALIF is used with great frequency in Farsi. In fact, I > bet it appears in every Arabic-alifba-using natlang.
But I searched for the word /la:/ by itself, not as part of a word. But I wasn't aware that the word for 'not' in Arabic is also the word for 'strand' or 'layer' in Farsi (I just looked that up), so a better word could be used. I tried /?\ala:/ 'on, over, above' in Arabic, but it returned almost 6 million hits, a lot of them in Farsi. And no native words in the latter have either voiceless or voiced pharyngeal fricative.
> Of course, there may be a qualitative difference in the encoding since > Farsi et al. use a different handwriting style, the so-called KUFIC or > "horizontal" script, while Arabic(s) and African natlang alifba > borrowers use the "vertical" script, but this may appear in coding > simply as a font choice.
It's just a font choice, and Farsi can be written in any calligraphic style Arabic can be. It's Urdu that is normally written in Nastaleeq, rather than Naskh.