Text-to-speech: Talking Computers!

One aspect of speech technology is the conversion of written text (on the computer) to speech. Most people think of metal voices now. Indeed, the text-to-speech technology of the past years wasn't that great.
But everyone who has ever listened to a new text-to-speech program will tell you that computer-generated voices are quite clear now, they're still a little robotic, but understandable.

One way to manage this is to record (parts) of words and paste them together so it becomes continuous speech. This method works great for purposes that use a small, fixed vocabulary, such as phone-information-services etc. The method isn't suitable for diverse texts.
For texts with a very big vocabulary the computer must generate the speech.

You could record every word in the English language and then play them in sequence to generate speech, but this sounds very robotic, and the soundfiles would occupy a huge amount of diskspace.
If the computer himself generates the speech, no soundfiles are needed.

Another technique is to record every transition between the different sound in a language. For example: for 'sh' you would record the transition between the s and the h. You can then generate fluid speech by pasting all those transition as needed. Suppose there are 50 different sounds in a language, then there would be 50 X 50 = 2500 transitions in that language.


Back to Speech Technology