Sorry, this is a Java applet. Use the text-hyperlinks at the bottom of the page to navigate.   PROJECT AI   pink_but.gif (965 bytes)
Interface Modes
Sorry, this is a Java applet. Use the text-hyperlinks at the bottom of the page to navigate.
 

Pattern-Recognition--Sound

[ Speech Recognition | Speech Synthesis ]

P.gif (998 bytes)erhaps not as widely-regarded as important as vision is, the human auditory system plays an important function for structured thinking.  How so?  Even though the brain just processes about a million bits of data per second as opposed to the 50 billion bits of data processed per second from each eye, people usually receive communications through the ears.  Thus, it is the meaningful language in the audio messages that compel people to think about what is being said(natural language processing) as well as how the message was constructed that relates to what is meant.   That is why it is easier to follow a news broadcast just by listening to it rather than just watching it with the sound turned off.(Kurzweil 265)

Speech Recognition

The main focus in AI when it comes to sound-processing is to make a computer that can recognize what a person says to it.  The reason why this is done as opposed to making a computer recognize the sound of a car or the sound of a telephone ring is because 1) there usually is something meaningful when someone talks and 2) making a computer capable of automated speech recognition(ASR) would be a next step in man-machine interface(MMI).

Like vision, the place where sound is actual analyzed is in the brain--which precisely makes it difficult to study because the brain is not understood very well.  That is why speech researchers are more concerned with getting a computer to just recognize speech as opposed to getting a computer to mimic how people recognize speech(i.e. the top-down approach).

Computers of today can store many hours of sounds digitally.  However, strict voice-pattern-to-voice-pattern matching is not accurate enough for a computer to realize that the voice it received and the voice it stored in memory comes from the same person saying the same thing.  This is not the fault of the computer per se; people tend to speak a little differently each time they talk.  The example in the box below illustrates this point:

Sound Example #1
(click on picture to enlarge)

"Hello, my name is Duong Hang and this is Project AI!"  These are two speech spectrographs that measured how much energy(color--blue to red) was represented in each frequency(Hz) along the vertical axis over time along the horizontal axis.  Notice how different frequencies were stressed in the graphs, yet the spectral patterns are similar.

To make ASRs smarter, researchers are trying to develop pattern-recognition programs that can recognize the similar patterns between speeches spoken at different times like the sample above, saying different things, and different people saying the same thing(speaker-independence).

ASR research has spawned various voce-activated programs today, though it often requires some extensive training to get a computer to recognize a particular voice.  Eventually, coupled with natural language processing and other intelligent thought capabilities, a computer may one day be able to carry out commands come from conversation-like phrases.

[ Top ]

Speech Synthesis

An aspect of sound-processing research that has made faster progress than ASR is speech synthesis.  Taking the knowledge from ASR such as phonemes and such[more stuff], speech synthesizers have become relatively successful in generating understandable words and sentences.  However, making a computer speak naturally without the stoic voice reminiscent of old science-fiction robots is still a greater challenge ahead.

As a branch of AI, the importance of ASR and speech synthesis lies in the development of pattern-recognition programs that understands the bits of data that compose the message.  Some of the early technologies from this field have found their way into the applications market, but they still need to be refined in order for a computer to communicate intelligently and naturally like people.  Perhaps in the future, a person can carry on an engaging conversation talking with his personal computer which combines the power of ASR, language-processing, inferencing through a knowledge base, some other intelligent component for the computer to be an active conversationalist and steer the discussion in a certain direction, and a speech synthesizer that makes the computer sound like a person.

 
Go to Top

 

Back Parent Page
botbar.gif (1032 bytes)   Interface Modes   LCARS Online  

[ Start | Links | Resources | Glossary | Help | About ]

[ Introductions | Thoughts on Thought | Language | Brain & Computer | Pattern-Recognition | AI Components |
Applications | To Be Human | The Future ]

Current Mode: [ JavaOptional Modes:  [ No Java | Text Only ]