Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: OFF: Dissociated Press

From:BP Jonsson <bpj@...>
Date:Sunday, April 9, 2000, 14:50
At 13:52 08.4.2000 -0300, FFlores wrote:

>Ah yes, there's a letter-based variant too. If you just take a space- >separated list of words and preserve the spaces, you have your BOW/ >EOW mark. It seems to work better with English, since words are short, >than in Spanish, for example, and better yet if you allow many sound >clusters. The main problem is that the algorithm ignores context -- >it just knows what came immediately before. For example, see what >happens to this paragraph (before the last "For"):
What about setting a limit on word length? I've also seen similar programs that look for letter pairs or triplets. If "_" is any punctuation/space/newline/tab and "*" is any letter, then you first look up all instances of _*, then if you got _a you look for a* and append the found * to _a, you get _ab, look for b* and so on. If you get to the word-length limit -1 -- say the limit is 6 and you got _abcde, then you start looking for e*_. You can also start with _ab, look for ab*, add *, look for bc*, add *, look for cd*, and so on. Trying different combinations of word length and length of matching pattern you should be able to fine-tune the degree of resemblance between source and output. /BP B.Philip Jonsson <mailto:bpj@...>bpj@netg.se <mailto:melroch@...>melroch@my-deja.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~__ Anant' avanaute quettalmar! \ \ __ ____ ____ _____________ ___ __ __ __ / / \ \/___ \\__ \ /___ _____/\ \\__ \\ \ \ \\ \ / / / / / / / \ / /Melroch\ \_/ // / / // / / / / /___/ /_ / /\ \ / /Melarocco\_ // /__/ // /__/ / /_________//_/ \_\/ /Eowine__ / / \___/\_\\___/\_\ I neer Pityancalimeo\ \_____/ /ar/ /_atar Mercasso naan ~~~~~~~~~Cuinondil~~~\_______/~~~\__/~~~Noolendur~~~~~~ || Lenda lenda pellalenda pellatellenda cuivie aiya! || "A coincidence, as we say in Middle-Earth" (JRR Tolkien)