Time Series Analysis 1 : Least-Squares Regression

What is time series analysis? It's a means by which economists (and mathematicians, and scientists, and others) attempt to look at a set of data (usually recorded over time, thus the name time series) and see why it looks the way it does.

For an example, refer to the CurveFit demo. We have it in both an ActiveX control and a Java applet, so use whichever you prefer. What does the CurveFit demo do? It takes points that you click, and draws the line of best fit through the points. By doing this, it assumes that your data are fairly linear, that is, the points almost lie on a straight line. If the data are linear, then linear regression works well. If not, a more sophisticated technique is necessary.

The most basic type of time series analysis is linear-least-squares regression. Why linear? Because the terms in your function are added together (a linear combination). For example, a linear fit uses ax + b, a linear combination of the functions x and 1. A quadratic fit uses ax2+bx+c, a linear combination of the functions x2, x, and 1. It is important not to confuse the general linear-least-squares regression with linear fitting (linear regression). It is rather unfortunate that the same name is used for each. Linear-least-squares means you are dealing with a linear combination of functions; linear fitting means you're trying to fit a line to your data. To alleviate this problem, for the remainder of our discussion we'll simply refer to least-squares regression, omitting the word "linear."

Why least-squares? For each data point (x, y) - since we're considering time-series analysis, x may represent time - our fitting method gives us a new estimate for y, let's call it y*. The error in our measurement could be determined by (y*-y), but this number could be positive or negative, and if we add up the differences for each y we could have values canceling out. To remedy this problem, we use (y* - y)2 as our error (or residual).

So, let us now state the goal of least-squares regression: given a set of functions f1, f2, f3, ... fn, and a set of data points (xi, yi), we wish to find coefficients ci and a function

f = c1 f1 + c2 f2 + c3f3 + ... + cn fn

such that the sum over i of (f(xi) - yi)2 is a minimum.

Common types of regression (found, for example, on graphing calculators) include:

Wow! That should solve everything, right? There are a few problems with it, though:

Next we'll look at some other ways to analyze time series, by breaking it down into cycles of different lengths. If you want to learn more about regression, consult a good book on statistics.

Go on to the next section