Conlang: Re: IPA speech synthesizer (Eric Christopherson, Feb 22 '09, 4:29)

From:	Eric Christopherson <rakko@...>
Date:	Sunday, February 22, 2009, 4:29

On Feb 20, 2009, at 10:55 AM, Roger Mills wrote:

> BP Jonsson wrote: >> On 2009-02-19 Arnt Richard Johansen wrote: >>> the individual segments in a speech stream affect each >> other so much that you can't just splice together phones >> and get a result that sounds like speech. >> >> Or to put it otherwise 'segments' are just the >> wave-tops in the stream, corresponding to when >> the speech-organs are closest to their target >> positions, separated by throughs/transitions >> which actually take up most of the stream. >> The discreetness between segments which we think >> we perceive are a product of the analysis >> of the sound stream which our brain performs >> before the perceived signal even reaches >> our consciousness. >> > Would it not be poassible to create a machine that could read and > reproduce spectrograms? (Or is that what Alex's "bigrams" meant?)

I think he means transitions between two segments; Wikipedia calls them "diphones" (<http://en.wikipedia.org/wiki/Diphone>).

> > But on second thought, that's redundant-- to create a spectrogram > you have to (usually) make recording first, so why not just go with > the recording...?

I hadn't known about it before, but WP also mentions a machine that did just that (<http://en.wikipedia.org/wiki/Pattern_playback>). But yeah, it seems like it would make sense to just use the recording. Sai wrote:

> To recast it a bit: how hard would it be to make an IPA synthesizer > that is at least as good as a single-language speaker, linguistically > naïve, sounding out arbitrary IPA transcribed words using > http://www.phonetics.ucla.edu/course/chapter1/chapter1.html? It should > be good enough to be recognizable, but it doesn't need to be perfect.

You'd still have to strip the supporting /A/s from the consonant samples.