::: 1.5 Picking Between the Two :::
Now, you must be wondering, which do I use? The five number summary or the standard deviation and the mean? Well, if you will quickly recall what I told you earlier, this will all become clear. Remember how I said that the mean is easily influenced by outliers? Thus, the standard deviation is as well, because it is based on the mean. Therefore, they are good descriptors for a symmetric or roughly symmetric distribution, but will be inaccurate if used with heavily skewed data. This fact is obvious, when given an example. Consider this data set: 0,0,0,0,100. The mean would be 20, indicating a center at 20. However this is obviously inaccurate. The center obviously should be at 0, and that is what the median gives us. Certainly, the resistance to outliers by the median is clear here as this example exemplifies the reasons for using the five number summary (based on median) with skewed data.
|