The basic measures of central tendency are mean, median, and mode.
Given a collection of data, a common question is about where the data resides. Knowing the center or mid-point or average is a starting point as we consider the data.
Keep in mind that knowing the average of the data is not sufficient to make very many conclusions. Also, consider the spread or variance of the data prior to making decisions.
The Mean
The mean, $- \bar{X} -$, or what we commonly call the average is the total of all the data values divided by the number of data points.
$$ \large\displaystyle \bar{X}=\frac{\sum\nolimits_{i=1}^{n}{{{x}_{i}}}}{n}$$
xi is the individual data values
n is the number of data values
The mean is the first moment of the data. It is the center of mass. If the data were weights along a ruler, the mean would be the balance point with an equal amount of weight and distance from the mean on both sides. The mean is a very common measure of central tendency.
The mean defines the center of gravity (mass) of the data, it uses all the data, and no sorting is required.
On the other hand, extreme values may distort where the bulk of the values exist, it may be more time consuming to calculation than median or mode, and it is possible the mean is not actually the value of any of the data points.
The Median
The median, $- \tilde{X} -$, is the middle value when the data is sorted in ascending or descending order. For an uneven number of values, the median is the middle value. For an even number of values, the median is the average of the two middle values.
Here are two examples of sorted data and in both cases, 5 is the median.
With 9 values,
1, 3, 4, 4, 5, 7, 8, 9, 9
Where the 5 is the midpoint of the sorted data, with a count of four values on either side.
And with 10 values
1, 3, 4, 4, 4, 6, 7, 8, 9, 9
Where 4 and 6 are the two middle points, with an average of 5.
The median provides information on where most of the data lies, thus is not sensitive to extreme values, and it requires little calculation other than sorting.
The median calculation does require sorting, which may be tedious for large data sets. If the dataset does have extreme values they may be important and are ignored by the median. It is not meaningful to average medians to determine a combined data set median. The median will vary more from sample to sample than the mean.
The Mode
The mode is the most frequently occurring value in the dataset. Note it is possible for a dataset to have more than one mode.
In this dataset
1, 3, 4, 4, 4, 6, 7, 8, 9, 9
There are three 4’s, more than any other value, thus the mode is 4.
To determine the mode, simply identify the value that occurs the most often. If there is a tie then the dataset has more than one mode. The mode I no influenced by extreme values or outliers, and it is an actual value. The mode is easy to identify with a histogram type plot or similar graphic.
The mode may or may not be near the mean or median and there may be more than one mode.
Related:
Statistical Terms (article)
Role of reliability statistics (article)
Statistical Terms about Variation (article)
Leave a Reply