MEASURES OF CENTRAL TENDENCY
AND VARIABILITY
Three primary pieces of information are typically used to provide
indicators of subjects' performances in data. These three pieces
of information are: the shape of the distribution of scores (symmetrical,
positively or negatively skewed), its "average" or typical score (e.g.,
mean, median, or mode), and the spread or variability of the scores in
the distribution (e.g., range, variance, and standard deviation).
The shape of the distribution of scores is reflected in the relationship
among the "average" or typical scores in that distribution.
The term average itself can be confusing because there are three
distinct performance measures that can be used to define the average or
typical score in the distribution of scores. The intent of each measure
of average is to identify a score that might appropriately represent the
typical score of that data. In general, these measures identify a
point near the center of the distribution. Therefore, these measures
have been called "measures of central tendency." These measures of
central tendency are the mean, median, and mode.
Although each measure of central tendency attempts to identify the
most typical score in that distribution of scores, each measure has its
own interpretation of the most typical score. The mean defines central
tendency as the mathematical average of all the scores (a measure that
you are very familiar with). The median defines central tendency
as the point where half the scores fall above that value and half the scores
fall below it. Finally, the mode defines central tendency as the
most frequently occurring score in that distribution of scores.
The two most widely used measures of central tendency are the mean
and the median. Although the mode is also a measure of central tendency,
its use is usually limited to describing qualitative data. When one
is to select a measure of central tendency, the choice is usually between
the mean and the median. Which measure should be chosen? Such
questions often arise in statistics, since there is usually more than one
statistical method available for dealing with a problem. However,
this does not imply that all methods are equally acceptable for a given
set of data. The correct choice will depend, in part, on the type
of data being analyzed (qualitative or quantitative), the shape of the
distribution of scores, and the question being asked.
If the data being analyzed is qualitative, then the only measure of central tendency that can be reported is the mode. However, if the data is quantitative in nature (ordinal or interval/ratio) then the mode, median, or mean can be used to describe the data.
With quantitative data, the shape of the distribution of scores (symmetrical,
negatively or positively skewed) plays an important role in determining
the appropriateness of the specific measure of central tendency to accurately
describe the data. If the distribution of scores is symmetrical or
nearly so, the median and mean (as well as the mode) will be real close
to each other in value. In this case, the mean is the value of central
tendency that is usually reported. However, if the distribution of
scores is positively or negatively skewed, the mean will tend to either
overestimate (in positively skewed distributions) or underestimate (in
negatively skewed distributions) the true central tendency of the
distribution. In extreme cases of skewed data, the mean can lie at
a considerable distance from most of the scores. Therefore, in skewed
distributions, the median will tend to be the more accurate measure to
represent the data than the mean because the median can never have more
than one half the scores above or below it.
To describe data solely by its measure of central tendency, however,
can be quite misleading. Two distributions of scores may have the
same mean, median, and mode but differ in their variability or dispersion
of scores. That is, the scores in one distribution may tend to cluster
more closely around the measure of central tendency than the scores in
the other distribution. To further describe distributions, another
statistical measure in addition to a measure of central tendency, is needed
to reflect the amount of spread or variability of the scores. Statisticians
have suggested several measures, called measures of dispersion (variability),
that indicate for any distribution the spread or variability of the scores
in the distribution.
As with measures of central tendency, different measures of dispersion are appropriate for different problems. The most common measures of dispersion are the range, variance, and standard deviation. The appropriateness of each would depend, in part, on the type of data that you have and which measure of central tendency you are using. If the data is qualitative, then there is no measure of variability to report. For data that is quantitative (ordinal and interval/ratio) all three measures are possible. However, the shape of the distribution of scores and the measure of central tendency reported will determine which measure of variability to use. If the distribution of scores is symmetrical in nature, then the measures of variability usually reported are the variance and standard deviation, although the standard deviation would be more interpretable. However, if the data is skewed, then the measure of variability that would be appropriate for that data would be the range.
In summary, with qualitative data, the only additional measure to
be concerned with to further describe that data would be the mode.
With quantitative data, the mean, variance, and standard deviation would
be appropriate with symmetrical distributions while the median and range
would be appropriate when the distribution is skewed (either positively
or negatively).