Not All Averages Are Equal
When someone says "the average," they usually mean the mean — the sum of all values divided by the number of values. But the mean is just one of three common measures of central tendency, and in many real-world situations, it's actually the wrong one to use. Choosing the right average can be the difference between insight and distortion.
The Mean: Familiar but Sensitive
The arithmetic mean is calculated by adding all values and dividing by the count. It's intuitive and mathematically convenient — which is why it dominates.
But the mean has one significant weakness: it's highly sensitive to outliers. A single extreme value can pull the mean far from the center of the data.
Classic example: imagine five people in a room with annual incomes of $30,000, $35,000, $40,000, $45,000, and $1,000,000. The mean income is about $230,000 — a figure that doesn't represent any single person in the room accurately. The four "typical" earners are all well below that mean.
Use the mean when: data is roughly symmetrically distributed without extreme outliers (e.g., heights, test scores in a normal classroom setting).
The Median: The True Middle
The median is the middle value when data is sorted in order. For an even number of values, it's the average of the two middle values. It is entirely unaffected by extreme outliers.
Returning to the income example: the median income is $40,000 — a figure that actually reflects the experience of the typical person in that room.
This is why economists and housing analysts typically report median household income and median home prices rather than means. In skewed distributions (like income, wealth, or property values), the median is almost always the more honest summary.
Use the median when: data is skewed, contains outliers, or when you want to describe the "typical" case rather than the mathematical center.
The Mode: The Most Common Value
The mode is simply the value that appears most frequently in a dataset. It's the only measure of central tendency that works with categorical (non-numerical) data.
Examples of mode in action:
- The most common shoe size sold in a store (useful for inventory planning)
- The most frequently selected answer on a multiple-choice survey
- The most common age at which people in a country first own a smartphone
A dataset can have no mode (all values unique), one mode (unimodal), or multiple modes (bimodal, multimodal). When a distribution is bimodal, the mean and median can be deeply misleading — the data is telling you there are actually two different groups worth analyzing separately.
Use the mode when: dealing with categorical data, identifying the most typical response or product, or when the distribution is multimodal.
Quick Reference Guide
| Measure | Formula | Best For | Avoid When |
|---|---|---|---|
| Mean | Sum ÷ Count | Symmetric data, no outliers | Skewed data, extreme outliers |
| Median | Middle value | Skewed data, income, prices | Multimodal distributions |
| Mode | Most frequent value | Categorical data, typical response | Continuous data with no repeats |
The Takeaway
Whenever you see the word "average," your first question should be: which kind? The choice of measure shapes the story the data tells. A city reporting "average rent" using the mean will paint a different picture than one using the median. A product team using mode to understand customer preferences will make better inventory decisions than one relying on mean purchase values.
Data literacy begins with recognizing that even the simplest summary statistic carries a choice — and that choice matters.