From:Garth Wallace <gwalla@...>
Date:Monday, June 28, 2004, 7:52
Sally Caves wrote:
> I don't think this could work terribly well. For now, at least. Of course, > what do I know about "Cool Edit" and the real cutting edge stuff out there? > :) In my experience, most contemporary voice recognition programs operate > on a level far less complex than your actual enunciation (a step above > lip-reading, to use an analogy), and I don't think anybody could honestly > say, at this stage of our technological expertise, that the text readers are > accurate renditions of pronunciation, although they are intelligible. A > program would have to be far more complex than even our most complete IPA to > get timber, nuance, pitch, inflection, the minute changes that occur when a > sound is proximate to another sound, and so forth. Think of the subtle > distinctions between a French "r" and a German "r"; and how in German the > gutteral "r" (at least for me) will be different depending on what vowel or > consonant comes in front of it (I can do a reasonable uvular trill for back > vowels, but that changes for front vowels, and it's almost impossible for me > after any consonant except "k"; after "t" it sounds like a French uvular > scrape). It will always sound like a machine speaking, with that > machine-sameness. And as for writing what you speak, think of all the > kinks that have to be ironed out. If it's overly sensitive, you'll get an > unreadable bunch of gobbledygook depending on the speech patterns of any > individual. It's a great idea to aim for, but I think it will take a lot of > new work to produce useable results, and until then I'd prefer to learn > German pronunciation from an actual German. But I can see the technical > appeal of such a project.
I think a text reader for X-SAMPA wouldn't be impossible, especially if it's strictly phonetic. The real problems start when you try to go from phonemes to sound--you need a lot of language-specific data to make it work. Of course, the utility of a phonetic text-to-speech program would be a little limited, since you'd have to do all the work of converting from a phonemic representation to a phonetic one yourself.


