Previously, we used graphs to describe distributions. Now we will explore the mathematical techniques used to describe distributions. This is especially useful because it will allow us to compare distributions to other distributions using concrete mathematics.
As before, you were told to specifically note the center of a distribution. Now, we use numbers to represent those points, the mean and the median. You are probably very familiar with the mean (xbar) - it is simply the average. You know, as in your grade? Or your points per game average? This number is very useful for comparing between distributions, as in who can score more points per game between ten basketball players, or who can type the most words per minute between one hundred computer geeks. The median (M) also describes the center of a distribution, but in a different way. It is simply the midpoint of the values. To clarify, consider this set 2, 3, 6, 9, 10. The Median is 6. This is useful in describing a distribution when large outliers distort the mean.
Also, we use a method called the five number summary in order to describe the spread of the distribution. This is based on the median and consists of the first quartile (Q1) the third quartile (Q3), and the maximum and minimum. The first quartile is a number Q1 that has one fourth of the observations below it, while the third quartile has three fourths of the observations below it. These, along with the extremes, are used to describe the spread of the distribution.
A graphical interpretation of the five-number summary is seen in the boxplot. The box portion of the boxplot shows the range of Q1 to Q3, effectively visualizing the spread of the distribution. The median marks the center and the "whiskers" (graph also affectionately called box and whisker plot), show the range between the quartiles and the extremes. Outliers are plotted as singular points in space.
While the boxplot and the five number summary are based on the median, there is a way to measure spread based on the mean and it is called the standard deviation. The standard deviation (s) uses the mean as its center and is zero when there is no spread and increases with an increase in spread. A more in depth look at standard deviation will come further on. You're excited, I know!!