Re: OFF: Dissociated Press
From: | BP Jonsson <bpj@...> |
Date: | Sunday, April 9, 2000, 14:50 |
At 13:52 08.4.2000 -0300, FFlores wrote:
>Ah yes, there's a letter-based variant too. If you just take a space-
>separated list of words and preserve the spaces, you have your BOW/
>EOW mark. It seems to work better with English, since words are short,
>than in Spanish, for example, and better yet if you allow many sound
>clusters. The main problem is that the algorithm ignores context --
>it just knows what came immediately before. For example, see what
>happens to this paragraph (before the last "For"):
What about setting a limit on word length? I've also seen similar programs
that look for letter pairs or triplets. If "_" is any
punctuation/space/newline/tab and "*" is any letter, then you first look up
all instances of _*, then if you got _a you look for a* and append the
found * to _a, you get _ab, look for b* and so on. If you get to the
word-length limit -1 -- say the limit is 6 and you got _abcde, then you
start looking for e*_.
You can also start with _ab, look for ab*, add *, look for bc*, add *, look
for cd*, and so on. Trying different combinations of word length and
length of matching pattern you should be able to fine-tune the degree of
resemblance between source and output.
/BP
B.Philip Jonsson <mailto:bpj@...>bpj@netg.se
<mailto:melroch@...>melroch@my-deja.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~__
Anant' avanaute quettalmar! \ \
__ ____ ____ _____________ ___ __ __ __ / /
\ \/___ \\__ \ /___ _____/\ \\__ \\ \ \ \\ \ / /
/ / / / / \ / /Melroch\ \_/ // / / // / / /
/ /___/ /_ / /\ \ / /Melarocco\_ // /__/ // /__/ /
/_________//_/ \_\/ /Eowine__ / / \___/\_\\___/\_\
I neer Pityancalimeo\ \_____/ /ar/ /_atar Mercasso naan
~~~~~~~~~Cuinondil~~~\_______/~~~\__/~~~Noolendur~~~~~~
|| Lenda lenda pellalenda pellatellenda cuivie aiya! ||
"A coincidence, as we say in Middle-Earth" (JRR Tolkien)