Turning Sensor Noise Into Signal: FFT and Power Spectrum Features

How we used Welch’s method and grouped Polars operations to extract frequency-domain features from 180+ sensors — and improved anomaly detection by 11%

The Problem With Raw Sensor Data

If you’ve ever worked with industrial sensor data, you know the feeling. You’re staring at a time-series plot — vacuum pressure oscillating, milk yield pulsing, hydraulic oil pressure fluctuating — and somewhere in that mess is a signal that says “this machine is about to break down.”

Our project monitors milking robots at dairy farms. Each robot has over 180 sensors generating data at up to 300 Hz. That’s hundreds of thousands of data points per milking session. The raw signals are noisy, high-dimensional, and at first glance, uninformative.

Traditional time-domain features — rolling averages, standard deviations, peak-to-peak measurements — capture amplitude behavior well. They tell you “the pressure is higher than usual” or “the variance increased.” But they miss something fundamental: how the signal’s frequency content is changing over time.

A pump that’s wearing out doesn’t just get louder — it vibrates at different frequencies. A valve that’s sticking doesn’t just slow down — its oscillation pattern shifts. These frequency-domain changes are often the earliest indicators of mechanical degradation, appearing days or weeks before amplitude-based features show anything unusual.

We needed a way to capture this information. Efficiently. Across 20+ sensor channels. Over millions of time-series records.

What is FFT and Why Should You Care?

The Fast Fourier Transform is one of the most important algorithms in signal processing. It takes a time-domain signal — amplitude over time — and decomposes it into its constituent frequencies. Think of it like a prism splitting white light into a rainbow. The signal goes in as a complex waveform, and comes out as a spectrum showing which frequencies are present and how strong each one is.

Mathematically, the Discrete Fourier Transform of a signal x[n] with N samples is:

Each X[k] is a complex number representing the amplitude and phase of frequency bin k. The FFT is simply an efficient algorithm that computes this in O(N log N) instead of O(N²).

For a pure sine wave at 50 Hz, the FFT would show a single spike at 50 Hz. For real-world sensor data, you get a distribution — many frequencies at various strengths. This distribution is what we care about.

But raw FFT has problems for our use case:

· It assumes the signal is stationary (same frequency content throughout). Real signals change over time

· It’s sensitive to signal length and produces complex-valued output

· A single FFT over a long recording averages out temporal changes we want to detect

What we actually need is the Power Spectral Density (PSD) — a real-valued, smoothed estimate of how power is distributed across frequencies. And we need it computed over sliding windows so we can track how the frequency content evolves.

What is Power Spectral Density and Why Should You Care?

If FFT tells you which frequencies are present in a signal, Power Spectral Density tells you how much energy each frequency carries. Think of it as the energy fingerprint of a signal. Formally, PSD at frequency f is the squared magnitude of the Fourier transform, normalized by signal length:

Imagine a complex waveform from a hydraulic pump. It’s a mess of overlapping oscillations — you can’t tell what’s happening just by looking at it. PSD decomposes that mess into individual frequency components and measures the power of each one. Low-frequency components might represent the pump’s normal rotation cycle. Higher frequencies might come from bearing vibration or valve chatter.

The key insight for predictive maintenance: a healthy machine has a stable PSD. The energy distribution across frequencies stays consistent day after day. When something starts degrading — a bearing wearing, a belt loosening — the energy shifts. Power moves from one frequency range to another. This shift is often the earliest measurable sign that something is wrong, appearing before any change in amplitude, temperature, or flow rate.

PSD gives you a concise description of “how the machine is vibrating” rather than “how much the sensor value is.” That’s a fundamentally different and complementary perspective to traditional time-domain features.

Welch’s Method — An Intuitive Explanation

Welch’s method is an elegant approach to estimating PSD. Instead of computing one giant FFT, it:

1. Splits the signal into overlapping segments — like reading a book through a sliding window

2. Applies a window function (typically Hann) to each segment — this reduces spectral leakage at segment boundaries

3. Computes the FFT of each windowed segment

4. Averages the squared magnitudes across all segments

The result is a smoother, more reliable PSD estimate than a single FFT would give. The averaging reduces variance (noise in the estimate), while the overlapping segments preserve temporal resolution. For K overlapping segments, each windowed by w[n], Welch’s PSD estimate is:

where L is the segment length and U is the window normalization factor. In practice, scipy.signal.welch handles all of this.

from scipy.signal import welch
from numpy import average

# Estimate PSD using Welch's method
frequencies, psd = welch(signal, fs=sampling_rate, nperseg=2048)

# Weighted average: dominant frequency
dominant_freq = average(frequencies, weights=psd)

Two lines of actual computation. scipy.signal.welch returns the frequency bins and their power values. The weighted average then collapses this entire distribution into a single number — the dominant frequency:

Each frequency f_k is weighted by its power S(f_k). Frequencies with more energy pull the average toward themselves. This is essentially the “center of mass” of the power spectrum.

High power at low frequencies pulls the average down, high power at high frequencies pushes it up.

This single number — the PSD-weighted mean frequency — becomes our feature. It tracks how the machine’s vibration signature shifts over time. A gradual increase in dominant frequency on the hydraulic pump might indicate bearing wear. A sudden drop on the vacuum sensor could signal a leak.

Why Windows and Overlap Matter

Two parameters control the trade-off between time resolution and frequency resolution:

Window size determines how many samples each segment contains. Larger windows give better frequency resolution (can distinguish close frequencies) but worse time resolution (can’t see rapid changes). Smaller windows do the opposite.

Overlap controls how much adjacent windows share. With 90% overlap, a 60-minute window advances by only 6 minutes between calculations. This creates a smoother, more responsive feature at the cost of more computation.

For our milking robot data sampled at 1 Hz, we found that 10-minute windows with 90% overlap struck the right balance. The robots cycle through distinct operational phases (milking, cleaning, idle) that last minutes, not seconds — so 10-minute windows capture full phase transitions while the high overlap ensures we don’t miss short-lived anomalies.

The Prerequisite: Uniform Sampling

There’s one hard requirement that’s easy to overlook. The FFT — and by extension Welch’s method — assumes that samples are evenly spaced in time. Real sensor data almost never is.

Industrial sensors drop packets, log at variable rates, and produce timestamps with jitter. You’ll see clusters of samples followed by gaps, sometimes within the same second. Feed this directly to scipy.signal.welch and the frequency estimates will be wrong — the algorithm has no way to know that the spacing is irregular.

The fix is straightforward: resample to a fixed grid before computing FFT. Interpolate the raw data onto evenly-spaced timestamps at your target sampling rate. This gives Welch a clean signal with a known fs parameter, making the frequency axis of the PSD meaningful.

We resample our milking robot data to 1 Hz (1 sample per second) before any frequency-domain computation. The original data arrives at up to 300 Hz but with irregular gaps — resampling to a lower, uniform rate also reduces data volume without losing the frequency information we care about (our features of interest are well below 0.5 Hz).

The Performance Challenge: Rolling vs Grouped

Here’s where it gets interesting from an engineering perspective.

The naive approach is a rolling window — slide a window one sample at a time, compute PSD at each position. This works, but it’s brutally slow. For a 10-minute window at 1 Hz sampling, every position requires a Welch computation over 600 samples. Over a full day of data (86,400 samples), that’s 86,400 separate Welch calls. For 20 sensor channels, that’s 1.7 million PSD computations per day.

We tried it. It took over 40 minutes per robot per day. Unacceptable for a fleet of hundreds of machines.

The solution was grouped computation with interpolation:

1. Instead of sliding one sample at a time, divide the time-series into fixed groups spaced by (1 — overlap) * window_size

2. Compute PSD once per group (not once per sample)

3. Interpolate between group results to reconstruct the continuous feature

With 90% overlap on a 10-minute window, you compute PSD every 1 minute instead of every second. That’s a 60x reduction in Welch calls. The interpolation between points is nearly free.

import polars as pl
from scipy.signal import welch
from numpy import average

def psd_weighted_mean(signal, sampling_rate_hz):
    """Welch PSD -> single dominant frequency value."""
    freqs, power = welch(signal, sampling_rate_hz, nperseg=min(len(signal), 2048))
    if power.sum() == 0:
        return float('nan')
    return average(freqs, weights=power)

# Grouped computation in Polars: compute PSD per window, then interpolate
df = (
    df.with_row_index("idx")
    .group_by_dynamic("idx", every="600i", period="6000i")  # 10min window, 1min step
    .agg(
        pl.col("sensor_value")
        .map_elements(lambda s: psd_weighted_mean(s.to_numpy(), fs))
        .alias("dominant_freq")
    )
)

# Join back and interpolate
df = df_original.join(df, on="idx", how="left")
df = df.with_columns(pl.col("dominant_freq").interpolate())

The result? Processing 10 million time-series records across 20 sensor channels takes 4–5 seconds. The same operation with rolling windows would take minutes to hours.

Reading the PSD Feature in Practice

The plot below shows a single pressure sensor over one day. The top panel is the raw signal — noisy, dense, hard to interpret visually. The bottom panel is the PSD dominant power frequency extracted from the same data.

Every pressure event in the raw signal produces a peak in the PSD line — the dominant frequency rises because pressure is changing rapidly, likely valves working hard, pushing energy into higher frequencies. The three major events and the minor one are clearly visible as smooth peaks above baseline. Between events, the PSD settles back to a low resting frequency reflecting calm, steady-state operation.

This is what the autoencoder sees. Not the raw noise, but the clean frequency signature underneath.

What the Frequency Features Reveal

Once you have the dominant frequency feature computed for each sensor, the results are remarkably informative.

On a healthy machine, the frequency signature is stable. The vacuum pump runs at a consistent speed, the hydraulic system oscillates predictably, the milk flow follows regular patterns. The PSD-weighted mean frequency for each sensor stays within a narrow band.

When something goes wrong, the frequency signature shifts before the amplitude does. We observed:

· Bearing degradation — gradual upward frequency drift on pump sensors, appearing 3–5 days before amplitude anomalies

· Valve sticking — intermittent frequency drops on flow sensors, visible as “dips” in the dominant frequency trace

· Belt slippage — periodic frequency oscillations on motor sensors that weren’t visible in time-domain features at all

The frequency features captured patterns that rolling averages and standard deviations completely missed.

This is because amplitude-based features answer “how much?” while frequency features answer “how is the machine moving?” — a fundamentally different and complementary perspective.

Results

We added FFT-derived PSD features alongside our existing time-domain features (rolling averages, standard deviations, peak-to-peak, spread metrics) and retrained the anomaly detection model.

Before FFT features: The autoencoder-based anomaly detector achieved a baseline reconstruction quality that correctly identified known breakdown events in approximately 73% of cases.

After FFT features: Anomaly detection improved by over 11 percentage points. The model could now detect subtle degradation patterns that were invisible in the time domain alone.

The improvement came from two sources:

· Earlier detection — frequency shifts appeared before amplitude changes, giving the model more lead time

· Fewer false positives — some amplitude anomalies that were actually normal (seasonal temperature changes, different cow sizes) had stable frequency signatures, helping the model distinguish real problems from benign variation

Critical: PSD and Train-Test Data Leakage

PSD at time t encodes information from the entire window around it - including future samples. If you compute PSD before splitting into train and test sets, a window at the split boundary will contain frequency content from both sides. The model trains on features that encode test-set signal characteristics. Your metrics will be optimistic.

Split first, then compute PSD independently on each partition. Trim boundary samples where the window is incomplete. This applies to any windowed feature, but PSD windows are typically much wider (10 minutes vs seconds), making the leakage worse.

Practical Implementation Notes

A few things we learned deploying this at scale:

1. Handle constant signals gracefully — Some sensors output constant values during idle periods. Welch’s method returns a zero-power spectrum in this case, which would cause a division-by-zero in the weighted average. Return 0.0 for constant signals

2. Cap the nperseg parameter — scipy.signal.welch can be slow with very large segment counts. We capped nperseg at 1 million, which gives excellent frequency resolution without excessive computation

3. Normalize after FFT — The raw weighted-mean frequency depends on sampling rate. Normalizing to [0, 1] range makes features comparable across sensors with different physical units

4. Phase-shift correction — Grouped computation introduces a half-window delay. Shift the interpolated results back by window_size * 0.5 to align with the original signal

5. Choose overlap based on signal dynamics — 90% works for slow-changing industrial processes. For faster dynamics, you might need 95% or even 99% overlap

When to Use Frequency Features

FFT-based PSD features add value when:

· Your data comes from mechanical or electrical systems with oscillating components

· You suspect degradation manifests as frequency changes before amplitude changes

· You have sufficient sampling rate to capture the frequencies of interest (Nyquist: at least 2x the highest frequency)

· Your time-domain features alone aren’t detecting all the failure modes

They’re less useful when:

· The signal is inherently non-periodic (random walk processes)

· Sampling rate is too low to capture meaningful frequency content

· The failure mode is purely amplitude-based (sudden sensor dropout)

Conclusion

The Fast Fourier Transform has been around since 1965. Welch’s method for PSD estimation since 1967. These aren’t new techniques. But applying them efficiently at scale on modern dataframe engines like Polars — processing millions of records in seconds rather than hours — makes them practical for production ML pipelines in a way they weren’t before.

For our predictive maintenance project, frequency-domain features delivered an 11% improvement in anomaly detection. The implementation required no exotic libraries — just scipy.signal.welch, numpy.average, and Polars’ group_by_dynamic. The entire feature engineering step adds under 5 seconds to the pipeline per robot per day.

If you’re working with sensor data and only using time-domain features, you’re leaving signal on the table. The frequency domain is where the early warnings live.

The author is a lead data scientist with years of experience in manufacturing business and intelligent automation systems.

Turning Sensor Noise Into Signal: FFT and Power Spectrum Features was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.