Re: IPA speech synthesizer
From: | Gary Shannon <fiziwig@...> |
Date: | Friday, February 20, 2009, 20:49 |
--- On Fri, 2/20/09, Philip Newton <philip.newton@...> wrote:
> > Would it not be poassible to create a machine that
> could read and reproduce spectrograms? (Or is that what
> Alex's "bigrams" meant?)
>
> I interpreted "bigrams" as combinations of
> phones.
>
> So that, for example, "conlang" would be,
> conceptually, produced by
> splicing together the bigrams for [kQ] [Qn] [nl] [l&]
> [&N]. (Not sure
> whether [#k] and [N#] would also get bigrams; possibly so.)
>
The biggest problem to my (uniformed) mind is that of intonation. Bigrams, even
very well done bigrams (or trigrams, or N-grams) will still produce monotone
renderings.
Just out of curiosity I took a couple seconds of natural speech from a recorded
class lecture and filtered out the higher harmonics to emphasize the tonal
changes. Then to make those tonal changes easier to pick out I doubled the
frequency and put the original speech on the left channel and the tonal
tracking on the right channel. You can hear how much frequency variation there
is in even such a short sample of natural speech by clicking on this mp3 file:
<http://fiziwig.com/he_was_born_etc.mp3> (Pan from left to right to hear each
track separately)
Just this one quickie experiment suggests to me that the most significant problem
in natural-sounding synthesis is going to be patterns of intonation.
--gary
Reply