Now we shall probe further in depth into this mysterious science that is known as statistics. We will begin by entering the jungle of the data set. Since we are in the jungle, there are lots of animals - monkeys (the best), kangaroos, and sharks. The animal type are known as the individuals. Basically, a data set just contains a whole lot of these individuals - or just a couple. Each individual also has variables that the data represents. For example, there could be the "coolness" variable, in which case the shark would have an 8, the kangaroo a 10, and the monkey a 1 million. Another example is the speed variable. Once again, the shark is described by the data set as being fast, the kangaroo as being slow, and the monkey as being super fast! Check out this table:
| Animal Type: | Coolness: | Speed: |
| monkey | 1 million | super fast |
| shark | 8 | fast |
| kangaroo | 100 | slow |
Now you're probably thinking to yourself, can speed be described in words? Is that valid to be in the data set? Yes! I'm glad you asked that question. The difference is this: Coolness, measured by numbers is called a quantitative variable while Speed, using a word description is called a qualitative variable. The difference being that qualitative variables put the individuals into categories based on their responses while quantitative variables are numerical values used to describe the individual. Don't let the big words throw you off. Just think: qualitative - quality description. Simple, no? Then, a distribution of a variable shows what values it takes, and how often it takes them. For instance, if you wanted to catalog the total coolness in the jungle, you just need to write down every animal's coolness level in a big data set, and then the distribution of the coolness variable would be the coolness numbers (i.e. 8 for shark) and how often that showed up.
And that's the basics of data sets!