[ Start | Links | Resources | Glossary | Help | About ]

PROJECT AI
[ Java | No Java | Text Only ]

Introductions

Thoughts on Thought
Language
Brain & Computer
Pattern-Recognition
AI Components
Applications
To Be Human
The Future
FAQs
 

Pattern-Recognition--Vision

[Capturing an Image in a Numeric Array | The Detection of Lines and Boundaries | Descriptions of Texture| Matching with a Model | Conflicts with Reality] 

C.gif (1020 bytes)computers, with the aid of appropriate camera equipment, can easily view objects just as we do.  However, they do not interpret images and objects in the same way that we can.  For example, when we see a sunny beach, we can make out palm trees, specks of sand, the ocean, birds, and surfers hitting the waves.  However, in that same scene all a computer "sees" is an two-dimensional grid of pixels(short for "picture element"; one of thousands of points on a computer screen from which digital images are formed)(Herndon, 121) with varying colors and degrees of brightness based on numerical values. The goal   in artificially intelligent machines that use computerized vision is to be able recognize objects, from any angle, even when the objects themselves are slightly distorted.

Capturing an Image in a Numeric Array

Image formation is the most technically developed stage of machine vision.  A camera records the amount of light reflected into it from the surfaces of objects in a three-dimensional scene.  The information is then transmitted through a converter that changes the analog signals into digital information that the computer can interpret.  The digital information samples represent positions on a range of brightness, or intensity, values called a gray scale.  These numbers are formed into a two-dimensional grid called a gray-level array.  Each value in the array or grid makes a pixel of the digitized image.  AI-vision systems commonly use gray scales with values that range from 255(lightest) to zero(darkest). However, color vision systems have three separate measurements for different intensities of red, green, and blue.(RGB)  This digital RGB system is what the digital TVs that will become the standard in the television are based on.(Herndon, 19)

The Detection of Lines and Boundaries

Once the computer has received the stream of numbers representing the varying light intensities reflected from a scene in the three-dimensional world, it has to make use of the numbers to understand what they mean. The first step is edge detection, in which the computer makes outlines of objects or parts of objects.  In order to do this, the computer searches for sudden changes in brightness values that are associated with edges such as those that result from surface creases, object boundaries, or changes in color.  However, these edges can sometimes be masked by noise (minor variations in intensity caused by surface texture or imperfections such as scratches, and by electronic fluctuations inherent in the digitizing process). In order to screen out noise, the computer must erase or reduce these insignificant values by a process known as smoothing.  In smoothing, the value of each pixel in the grid or array is replaced by an average of itself and its neighbors.  The larger amount of pixels that are averaged, the smoother the picture is.  After smoothing, an object appears to look like an outline of itself.(Herndon, 19)

Descriptions of Texture

  One way for computer-vision systems to pick out an object is by performing texture analysis.  Since a particular texture is represented in the gray-level array as a particular pattern of brightness values, sudden changes in texture could indicate changes in the physical surface.  For example, if you were to look at an aerial photograph of a highway in the middle of the desert, the texture of the road would stand out from the textures of the desert, thereby indicating a change in surface.  It is something that we naturally do so easily that it is much more complex in the computing world of bits and bytes.  Two ways of describing the texture of a surface are structural analysis and statistical analysis.  In structural analysis, the system looks for features and the relationships among them.  For example, on a pineapple there specific wooden sections of thorns that are arranged in diagonal rows, which would be identified as specific features arranged in a certain pattern. Using edge detection, a computer can discern the scales that make up the skin of a fish, thus being able to identify a fish with its scales.  Statistical analysis is used when specific features, such as thorns, are not easily discernable or are not visible.  In this method, the computer focuses on the relationship between a single pixel and its neighbors, analyzing the probability that a pixel's intensity is similar to that of its neighbors.  As a result the computer will be able to interpret whether a texture is rough, has contrasting colors, the regularity of the features if any, and to what degree for each of those quantities.  Thus a computer can make the hairs of a fur coat, the rough texture of a rock, or the smooth texture of silk.(Herndon, 23)        

Matching with a Model

  In order to recognize an object, the computer must match its outline, shape, color or texture with models stored in its memory.  Basically, the best way of doing this is to find an outline in its memory that is the best match for the outline of the object.  However, the outline of the object must be one that is from a similar angle as that of the stored outline.  The cone-like shape of a strawberry from a side-view is completely different from the circular shape of a top view of it.  Feature extraction, in which objects are further classified by textures or colors and shape, allows for further object distinction.  Although a computer could tell the difference between a melon and a banana by shape alone, it would need to analyze textures and color to tell the difference between a melon or on an orange.  A more complex system would separate objects into separate components, and be able to discern between very similar objects. For example, if the computer were to analyze two different humans, it could compare the length of their legs or the width of their arms.  To go even further, a computer could also compare how separate components could relate to one another.  The computer could differentiate between one person with his hands on his waist and another with his hands on his head.(Herndon, 27)

Conflicts with Reality

   Although there are computer-vision systems used in a number of different fields, most systems can only perform under very constrained conditions.  Optical character readers(OCR) can recognize letters printed in most typewriter fonts, but cannot recognize various forms of handwriting.  Robots that are involved in manufacturing can discern certain parts from others, but only from a specific perspective; if you turn the robot's camera 20 degrees to the left it would not be able to recognize any of the same parts it understood in its original position.  Even this is a minute problem, as an object's appearance can be altered in many other ways.  We as humans understand that an orange that is peeled is still an orange, or that a tree with wilted leaves during winter can be the same tree with bright green leaves in spring.  Excess illumination or lack of can also distort the perception of an object and its boundaries.  We might be able to tell that its the neighbor's dog Rover wandering the streets during sunset, but a computerized vision system might not be able to make heads or tails of it due to lack of light that makes it harder to discern textures and color.(Herndon, 29)

 

[ Goto Top ]

Current Mode: [ Text OnlyOptional Modes: [ Java | No Java

LCARS Online