You’ve Been Using Z-Scores. But Do You Actually Know What They’re Saying?

A Z-score is not just a formula you plug numbers into. It’s a question your data is asking and most people never hear it.

*Photo by Jean-Rene Chazottes on Pexels*

Let me start with a scenario.

You’re looking at two students. One scored 78 on a biology exam. The other scored 62 on a physics exam. Someone asks you: “Who did better?”

Your first instinct is probably: well, 78 is higher than 62, so the biology student did better.

But what if I told you the biology class average was 85, and the physics class average was 50?

Suddenly the picture changes completely. The biology student scored below their class. The physics student scored well above theirs. Raw numbers stopped making sense the moment context entered the room.

That’s exactly the problem a Z-score was invented to solve. And once you see it that way, the formula stops feeling like maths and starts feeling like common sense.

So what is a Z-score, really?

Here is the simplest version: a Z-score tells you how far a value is from the average, measured in units of spread.

Not in points. Not in percentages. In standard deviations.

That distinction matters more than it sounds. Because standard deviation is the data’s own natural unit of “how spread out things normally are.” When you measure distance in that unit, you’re suddenly speaking the data’s language.

X − μ asks: how far is this value from the center? Positive means above average. Negative means below. Zero means you’re exactly average.

Then dividing by σ rescales that distance. It’s saying: “Is this distance big or small, relative to how spread out the data normally is?”

A distance of 10 points means nothing in isolation. But 10 points when the standard deviation is 2? That’s enormous. When the standard deviation is 50? Barely a ripple.

The Z-score doesn’t ask “how far?” ~ it asks “how far, compared to what’s normal here?”

Back to the two students

Let’s actually run the numbers.

import numpy as np
# Biology student
x_bio = 78
mean_bio = 85
std_bio = 6
z_bio = (x_bio - mean_bio) / std_bio
print(f"Biology Z-score: {z_bio:.2f}") # → -1.17
# Physics student
x_phy = 62
mean_phy = 50
std_phy = 8
z_phy = (x_phy - mean_phy) / std_phy
print(f"Physics Z-score: {z_phy:.2f}") # → +1.50

zscore_students.py hosted with ❤ by GitHub

The biology student’s Z-score is −1.17. They scored more than one standard deviation below their class.
The physics student’s Z-score is +1.50. They scored one and a half standard deviations above theirs.
Same raw numbers telling you one story. Z-scores telling you an entirely different and far more accurate one.

What Z-scores actually feel like on a scale

Once you have a Z-score, the next question is: what does this number mean in practice? Here’s an intuition that helps.

Alarm bells. This is an outlier worth investigating closely.

This is why Z-scores are so widely used in anomaly detection, fraud detection, and quality control. You set a threshold often ±2.5 or ±3 and anything beyond it gets flagged for a closer look. Not because you’ve proven something is wrong. Just because it’s unusual enough to ask questions about.

Think of the Z-score the same way you’d think about a doctor measuring a child’s height. The actual height matters less than where it lands on the growth chart — above average, below average, or significantly outside the normal range for their age.

Where Z-scores quietly do important work

You might think this is mostly a classroom concept. It isn’t. Z-scores show up constantly in real data work often without being named.

In feature scaling, when you standardise your data before feeding it into a machine learning model, you are computing Z-scores. Every value gets transformed so the mean becomes 0 and the standard deviation becomes 1. This stops one feature from dominating others just because its raw numbers happen to be larger.

In outlier detection, a Z-score above 3 or below −3 is a standard first pass for flagging suspicious data points whether that’s a fraudulent transaction, a sensor malfunction, or a data entry error.

In comparing across distributions, just like the student example, any time you need to compare values from different scales or contexts, Z-scores create a shared language.

import numpy as np
data = np.array([12, 15, 14, 10, 100, 13, 11, 14, 15, 12])
mean = np.mean(data)
std = np.std(data)
z_scores = (data - mean) / std
for val, z in zip(data, z_scores):
flag = " ← investigate" if abs(z) > 2 else ""
print(f"Value: {val:>4} | Z-score: {z:+.2f}{flag}")

zscore_outlier.py hosted with ❤ by GitHub

Run this and watch how 100 stands completely apart from everything else. The Z-score surfaces what your eye already suspects but couldn’t quantify.

import numpy as np
import matplotlib.pyplot as plt
# Generate normally distributed data (simulating Z-scores)
np.random.seed(42)
data = np.random.normal(0, 1, 10000)
# Define zone masks
normal = (np.abs(data) <= 1)
notable = (np.abs(data) > 1) & (np.abs(data) <= 3)
outlier = (np.abs(data) > 3)
plt.figure(figsize=(10, 5))
# Plot each zone in its color
plt.hist(data[normal], bins=60, color='#378ADD', alpha=0.9, label='Normal range (±1)')
plt.hist(data[notable], bins=60, color='#EF9F27', alpha=0.85, label='Noteworthy (±1–3)')
plt.hist(data[outlier], bins=60, color='#E24B4A', alpha=0.85, label='Outlier zone (beyond ±3)')
# Mark threshold lines
plt.axvline(-3, color='#E24B4A', linestyle=' - ', linewidth=1.2)
plt.axvline(-1, color='#378ADD', linestyle=' - ', linewidth=1.2)
plt.axvline( 1, color='#378ADD', linestyle=' - ', linewidth=1.2)
plt.axvline( 3, color='#E24B4A', linestyle=' - ', linewidth=1.2)
plt.title("Z-score Distribution - How Often Each Value Appears")
plt.xlabel("Z-score")
plt.ylabel("Frequency")
plt.legend()
plt.tight_layout()
plt.show()

zscore_histogram.py hosted with ❤ by GitHub

Here’s what that looks like when you plot it:

The part where people get wrong

Here’s a mistake that’s more common than you’d think.
“Z-scores assume your data is roughly normally distributed. If your data is heavily skewed like income, or website traffic spikes a Z-score can be misleading. A value flagged as an outlier might just be a normal feature of a skewed distribution, not a genuine anomaly”

Always visualise your distribution before leaning on Z-scores for anomaly detection. If it looks like a mountain, Z-scores are your friend. If it looks like a ski-slope, be careful.

There’s also a subtler problem worth naming.

A high Z-score tells you a value is unusual relative to your dataset. It says nothing about whether that’s good, bad, meaningful, or worth acting on. A product with an unusually high return rate has a high Z-score. So does a product with an unusually high customer satisfaction rating. The number doesn’t carry judgment rather you have to bring that yourself.

Statistical unusualness is not the same as practical significance. The Z-score opens the door. Your domain knowledge has to walk through it.

The takeaway

A Z-score is not a complicated idea dressed up in maths. It’s a deeply intuitive one that the formula sometimes obscures.

It answers a question your raw data can’t: not “what is this value?” but “where does this value stand among everything else, on this data’s own terms?”

Once you feel that shift from absolute to relative, from isolated to contextual then, you start seeing Z-scores everywhere. In scaling pipelines. In anomaly alerts. At the moment someone shows you two numbers and asks you to compare them.
“The real insight: every dataset has its own internal sense of “normal.” A Z-score is just how you make that invisible standard visible so you can finally ask the right question about where any value truly stands.”

And once you ask the right question, the data almost always answers.

You’ve Been Using Z-Scores. But Do You Actually Know What They’re Saying? was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.