Stereograms : A COOL application of Geometry - Technical Explanation

[DIVIDER]

!!CAUTION: Graphically Intensive Page - Please View in at least 800x600 Resolution!!

Before reading my exposition on the subject of Stereograms, please download my Java program StereoGrinder (and the JDK) so you can follow along with the real source code and make some real stereograms yourself!

NOTE: In the examples, I will write in pseudo-Java. Some things will be simplified, such as casting. All of the messy code that actually works (which, by the way, has a slightly different and less clear naming scheme) is found in the ZIP file that you can download.
[Scan plane intersecting a ball]The very basic principle of Stereograms and how the whole 3D effect works can be summed up in the two pictures at left and below, representing the a scan plane of an image.

Axes: We are assuming that the Z-axis goes into the monitor (how far in front of or behind screen), the X-axis goes horizontal - with the scan lines (the horizontal rows of pixels), and the Y-Axis is the up and down on the screen. Stereograms are generated one row at a time, so for the most part, we ignore the whole y-end of it and view the art of making stereograms in the 2D representation in the picture.

[Simple graphical description of 3D effect]

How it works: As you can see in the picture, two points of the same color are generated on the screen for each corresponding point on the surface. So, when your eyes converge behind the screen, the two dots, which are the same color, converge to one point at a certain depth in your mind's eye.

[Eyes and what they can see of the ball cross section] [Eyes looking at the cross section of a ball] The Depth Field: To generate a stereogram, we first need an array of depths on which we are looking (from here on, a depth field). Because of the nature of stereograms, you can only see one side of something - that is, if you're looking at a building (sample 2), you can't see the whole building, but just the building from one angle, just as you can't see the whole building in real life. Therefore, to generate stereograms, we only use the portions of an object that you can see, that being one side. Look at the images to right to see what I mean. We don't need to see anything beyond the first Z-value for any X and Y values because, at least in our simplified system, it's obscured by the first Z-value. So, because we only need one Z-value for each X and Y value, we can use an MxN matrix (aka array) for our depth field. Therefore, we can draw an image in a Painting program, and then extrapolate the depth array directly from the pixels. The image scanline used for the cross-sections described in the two diagrams at right would look like the following:

From Images to Depth Fields: To left is the image I used to generate a depth field for a ball (actually, it generated a cone because it wasn't the kind of gradient I needed, but let's not argue)

The way I chose to do this was to paint the depths in greyscale. I used white to mean the closest to your eyes and black to mean the furthest away, that is, black had a lower corresponding Z-value than white. This is a common technique in the back of Magic Eyetm books to tell you "what it is" you're supposed to be seeing, so I sort of copied it from there. If you've ever watched the video-on-CD that comes with the game Myst, you may remember that this is also the technique they used to draw the island.

To get the image from greyscale to the depth field, we take the range of values for each pixel (0-255) and map it to the range for our depth field (0-czField in diagram below) and subtract that from the maximum Z-distance with the equation:
depth = czEyesScr+czScrF+czField-czField*pixelColor/255

A Refined View of our Scan Plane: Before we continue, let us refine our picture of a scan plane to our final view of it - you may notice, if you've read the Stereogram FAQ, that we're taking the more mathematically complicated "fixed view" rather than the "floating view" of the eyes. This means that our stereograms will be mathematically correct, but designed to be only viewed straight-on (with the viewer's head center-aligned horizontally with the stereogram. So, without further ado, here is the final picture we will use:


Those red-capped bars represent the depthcells in the current scanline. It's not just you -- I just coined the term depthcell. As pixel means "picture element," I will use depthcell to mean "depth array element," meaning an element that records the Z-depth for its position in the Depth Array. You can also look at a depthcell as the bar extending from the back plane of the depth field to some depth within the depth field. For example, the depthcell on the far left would be generated from the far left pixel in the current line of the image. That far left pixel would be white and hence has the smallest Z-component that an element in the depth field can have. The next depthcell to the right would be generated from a pixel that is dark to mid grey.

The relative coordinates of all of the key points are described in the following diagram. All of the points on the left side have the same Z-coordinates and the opposite (negative) X-coordinates as the right hand counterparts. All coordinates in the diagram are of the form (x,z).



A Stereogram with Edge Scatter
A Clean Stereogram
Edge Scatter and Depth Field Padding: The problem is that the depthcells we get from the greyscale image only occupy the extent that both eyes can see of the Front plane of the Depth field, and thus leave spots without depthcells to either side. The reason we don't just let them strech the entire length of the back plane is because we don't want to lose any data. If the depthcells were allowed to extend the length of the Depth Field the left eye couldn't, for instance, see the far left depthcell in the diagram. The adverse effect of constraining where the depthcells are is that when the depthcells near the sides of the original depth field are low, your view extends to that void space without any depthcells like it is on the right in the diagram. This leads to nothing being generated for the edges of the stereogram, which looks terrible. What happens is that the eyes don't have anything to see at the edges and the whole illusion of depth scatters away. Thus, what we have to do is extend the number of depthcells per scan line by padding the left and right of the scan line with some extra depthcells, which I choose to be of the maximum depth (flat with the back). We will need to add the same number of padding cells to both the left and right so our image centers and our math will work out well.

[The pixels generated by brute scanning and writing][Brute scanning and writing]The Base Image of a Stereogram: If we were to go along our merry way and process each scan line and for each point generate two points of some random color on the screen for each point in our depth array(next discussion), we will not get what we want. What happens is that you overwrite what you have generated by one pixel with what you generate for another. This is exaggerated a bit in the diagrams because of the few number of pixels, but you see what can happen. Pairs become unusable because half or all of them are overwritten. We can limit the effect towards the edges by making sure we get only one depth per left eye pixel, which we will do for efficiency later, but we still have the problem of where the left and right projections "criss-cross" onto the same pixels.
What we do, then, is to first generate some base image either by plotting dots of random color throughout the image or by asking the user for some sort of image file to tile to make the background. (See
my program!) What we will do to make the 3D pairs is to go from left to right across the depth field and set each right pixel to what color is already at the left pixel. Thus, the program would never overwrite what was already in a left position, and have very little overwrite at all. The overwrite the stereogram would have would happen when depth changes suddenly and the only one I could actually see it in real life (look at the white depthcell in the diagram). This we will consider in the Ghost Images section.

Getting Ready to Generate a Scan Line of the Stereogram: In addition to the variables you see in the diagrams, which are all of type double (64 bit IEEE floating point, i.e. real numbers), except for cxImage (an integer), we will also use:

   double orig_cxBack     //The pure virtual width of the back plane, before it is modified for
                          // the extra depthcell padding
   double cxBack;         //(width of a depthcell)*(the number of depthcells we need)
   int orig_dA_Width;     //Width of the original depth array, without the padding
   int dA_Width;		  //Width of depth array, including the cell padding
   int dA_Height;		  //Height of depth array
   int depthArray[orig_dA_Width][dA_Height];
                          //The original depthArray - an array that is loaded, usually
                          //from a greyscale image
   int scanlineDepths[];
                          //This will be used for the scanline processing
                          //it will have dA_Width elements.
   int imageArray[][];    //The array for the image - in [Y][X] form
                          //It will be cyImage by cxImage in the end  

   double depthcellWidth; //The width of one depthcell in virtual coordinates (diagram coords)
   int addedDepthcells;   //The number of padding depthcells we should add, total
   int cyImage;           //The height of the actual Stereogram we will generate   
To generate our depth array from our image, the first thing we need to do is to set up our variables:
   cxFront = cxBwtEyes 
             + (cxImage-cxBwtEyes)*(czEyesScr+czScrF)/czEyesScr;
   orig_cxBack  = cxBwtEyes
             + (cxImage-cxBwtEyes)*(czEyesScr+czScrF+czField)/czEyesScr;
   depthcellWidth = cxFront/orig_dA_Width;

   addedDepthcells
      = Math.ceil(orig_cxBack/depthcellWidth) - orig_dA_Width;

   //We want  the number of added Depthcells to be even
   if(addedDepthcells%2!=0)
      addedDepthcells++;

   dA_Width = orig_dA_Width + addedDepthcells;
   cxBack = dA_Width*depthcellWidth;

   cyImage = (dA_Height*cxImage/dA_Width);
   cyImage = cyImage*
   (cxBwtEyes + (dA_Width*depthcellWidth-cxBwtEyes)*czEyesScr
    /(czEyesScr+czScrF+czDepth) )/cxFront;

   scanlineDepths = new double[dA_Width];
   imageArray = new int [cyImage][cxImage];
The first two lines of code, initializing cxFront and cxBack are based on the Elements, Book VI, Proposition 2. This proposition states that if you take a triangle ABC and construct a line parallel to the base that intersects the other two sides at points D and E, then the triangle ABC is similar to triangle ADE - all of the sides are proportional. To find cxFront and cxBack, we take the middle cxBwtEyes units out of the diagram so the two eyes meet and form a triangle. We then add a height and get the diagram to left. The ratios we then extract are AD:BC = AG:EF = AJ:HI. Translated to our coordinate system, this means that:
(cxImage-cxBwtEyes):(cEyesScr)
    = (cxFront-cxBwtEyes):(czEyesScr+czScrF) 
    = (cxBack -cxBwtEyes):(czEyesScr+czScrF+czField)

Taking it one ratio at a time, converting to fractions, and solving for cxFront and cxBack, we derive the first two lines of code.
The rest is simply setting up the data that described how many depthcells are in each scanline, including the padding depthcells, making sure that number is even, and calculating the new width of the back.
[The projection of viewed area onto the image plane] [Geometerized projection]

The lines to calculate cyImage are a bit odd, and I'll have to walk you through them. In fact, I did not catch an error in these lines until I wrote this article. I had been using an approximation that had left my images, on average, 115% taller than they should have been! First, the program calculates how high the image should be purely based on the ratio of width to height in the depth array. It then multiplies that by the ratio of KL to BC (see diagram). This second calculation is necessary to "square up" the image so that a ball looks round and not ovalish. The reason behind this is that, if a depth array is low (close to the back plane) near the edges, then there are those areas to the edges with no data (as we talked about before). Thus, when the sides of our original image are low, our actual data will only occupy that space between the green lines in the diagrams. So, to shrink the height to be square with the fraction of the entire width that the data occupies, we multiply it by that fraction, which is KL/BC. The formula is directly derived from that ratio by filling in the coordinate data. Originally, what I did was multiply the height by cxFront/cxBack because I had not actually drawn the diagram, and thought that that fraction was correct. The correct fraction was much more complicated.
The final two lines simply allocate the arrays we will use for the depths and the image. I use an array of integers for the imageArray because integers are 32 bits in Java, which hold an RGB triple quite nicely (see the Java docs).

After setting these variables, we would either set all of the pixel in our imageArray to random nonsense or to some tiled pattern set by the user. (See my program!)

From here we will load each scanline, one at a time, into our scanline depth buffer and work with it:

for(int y=0; y<cyImage; y++)
   {
   for(int g=0; g<cxImage; g++)
      {
      scanlineDepths[g] = czEyesScr+czScrF+czField;
      }
   for(int g=0; g<orig_dA_Width ;g++)
      {
      scanlineDepths[addedDepthcells/2+g] = depthArray[y][g];
      } 

   int stepX = (czScrF+czEyesScr)/czEyesScr;

   for(double curX = -orig_cxBack/2.0; curX < orig_cxBack/2.0; curX+=stepX)
      {
      int curDApos = Math.floor((cxBack/2.0+curX)/depthcellWidth);
      double distToLeftEye = (curX - (-cxBwtEyes/2.0));
      double distToRightEye = (curX - cxBwtEyes/2.0);

      int imgXa = (int)((cxImage - cBwtEyes)/2.0 +
                  distToLeftEye*czEyesScr/scanlineDepths[curDApos]);
      int imgXb = (int)((cxImage + cBwtEyes)/2.0 +
                  distToRightEye*czEyesScr/scanlineDepths[curDApos]);

      imageArray[y][imgXb] = imageArray[y][imgXa];
      }   
   }

In the loop, we first set all of our depthcells to the back of the depth field and fill in those center (non-padding) depthcells with data from the original depthArray. Following that, we iterate over the depth field. Instead of making a projection for each depthcell, we make a projection for each pixel in the image. Why? The pixels are what count! Imagine there are a hundred depthcells, add twenty for padding. Since we can set the width of our image to anything we want, let us also say that we want an image that is 450 pixels wide. This is fine and good, but if we're only going to project 120 depthcells, we certainly aren't going to have a continuous depth effect over our 450 pixels, but have what seem like wavy lines of depth every four pixels or so. So we project approximately one depthcell per pixel from the view of the left eye by starting at the left end of the back and incrementing our x position by stepX until we're past the right end of our array of the back. But how to gauge how big a step we should take? First of all, should our steps all be the same?


The problem simplifies to a question of whether, since our pixels are evenly spaced, the steps in the back are also evenly spaced. Look at the proof that these steps are, in fact, equal, here. Now that we know that the steps we should take should take, I will venture to say that there are some places where [An infinitely small step between depthcells of drastically different depths]*NO* step would be small enough to make sure that each left-eye pixel is accounted for - the reason why is to the right. You might ask - why don't we just project the points up the side of the bar? Well, it would make our program exceedingly complicated. Not nearly as complicated as a ray-tracer, but complicated nonetheless. We will deal with one such problem below, but I thought that tackling this one wouldn't be so important right now. Besides, we're mainly looking down on this thing anyway, so this problem shouldn't be to much. So what we're left with is to decide what a good enough step will be. If all of the depthcells are far away, the projection of one pixel's width onto the back plane will do (see the Evenly-Spaced picture above). But, if all of the depthcells are as near to us as they can be, we will have to have a smaller step (higher resolution). Assuming that we have a fairly smooth depth field, this smaller step will do. If we have some big jumps in the picture, there is no simple way to take care of it, as described above. So, to find this smaller step, we will simply project a unit of one on the Screen to the front plane. And that's how we get stepX. Now that we know how we get the step, we can explain the loop. We start at -orig_cxBack/2.0 and not -cxBack/2.0 because cxBack contains the extra padding which may have spilled over because orig_cxBack is probably not evenly divisible by depthcellWidth. (A refresher of our coordinates at this point might be in order) The main loop then iterates over a bunch of depthcells by incrementing curX and projecting each point until we get to the other end. For each point, we calculate the index of the coordinate in our depth arrray and then use Book VI, Proposition 2 of the Elements to project the difference in x of both of the eyes and the curX from the back plane onto the screen. The only thing that might be new to you would be the use of the mathematical floor function. The rest is pretty straight-forward.

[Depthcell that only one eye can see]Ghost Images: If we take the naive approach and simply go across as we have said and at each point, generate the two points, we may end up with a strange effect called ghost images - where we generate two dots for a point that, in reality only one eye can see. Thus, we "see" something we're not supposed to. To counter this problem we have to figure out which points we can see and which we can't (or only one eye can).

[Slopes to the edges of depthcells and the shadow areas they generate]
The way I solved it was by calculating the slopes at which the left eye sees the right edge of each depthcell to its right and the slopes at which the right eye sees the left of each depthcell to its left. If, going right from and viewing from the left eye, the ratio of breadth to depth were smaller for the current point than for the right edge of any of the preceding depthcells, then the view of that point is obstructed. Similarly, if, going left from and viewing from the right eye, the ratio of breadth to depth were smaller for the current point than for the left edge of any of the preceding depthcells, then the view of that point is obstructed. So what we do is to construct two arrays - one for the slopes to the left edges from the right eye and one for the slopes to the right edges from the left eye. Starting at the index of the depthcell right under each eye, we scann to the side, and keep track of what the largest slope (ratio breadth to depth) is. For each depthcell, we record what the largest slope is. Having filled in the slope information for the line, we simply test each point when we are getting ready to project it to make sure that a point is visible by both eyes. If the point is to the left of the left eye, we just have to test the slopes from the right eye. If the point is to the right of the right eye, we just have to test the slopes from the left eye. If, however, the point is between the two eyes (in x), then we must test the slopes from both eyes. You might ask, well, if we can get this sort of error when depthcells change drastically while we're scanning in x, then what about the changes between scanlines? I mean, suppose I have a greyscale image of a white square with a black border (like looking straight down on a rectangular prism). And I can't see the pixels immediately to the left and right of the prism because they are only visible from one eye, then what about the pixels immediately to the top and bottom, shouldn't we check to see if we can see those? Well, we probably should, but that would require a much larger amount of memory, would be more difficult to program, and would be much slower. Besides, the stereoscopic effect is really happening in the horizontal effect because we're taking horizontal scanlines, and the vertical changes are less important. The code for all of this can be found with StereoGrinder, which can be
DOWNLOADED.
-David


Back to Cool Java Page Back to Main Geometry Page Back to Main IMO Page