
Your synthetic data has timestamps. That does not mean it understands time.
The strangest model failure I have seen looked like a feature bug, a data bug, and a model bug at the same time.
We were testing a churn model for a subscription business. The model used a simple feature set: days since signup, days since last login, number of purchases in the last 30 days, and average order value over the customer’s lifetime. In staging, the model looked solid. Offline metrics were stable. The feature pipeline ran cleanly.
Then we ran it against a larger internal environment and churn scores went sideways. Long-term customers were being scored as brand new users. Some users had purchases dated before their signup date. Others had last-login timestamps that appeared after account closure. The model was not confused because it was weak. It was confused because the synthetic database violated time in subtle ways.
That failure taught me something I had underestimated for years: temporal consistency is not a nice-to-have property of synthetic data. It is a hard requirement for any ML pipeline that uses time-aware features.
A lot of synthetic databases include date columns and still fail this requirement. The timestamps look realistic in isolation, but the relationships between them are impossible. And once impossible timelines enter your feature pipeline, your model starts learning patterns that cannot exist in production.
This article is about how to detect that problem before deployment. We will build a synthetic relational dataset, intentionally break its time logic, and validate whether the generated data preserves the temporal structure your model depends on.
Why Time Breaks Models
Time-aware models do not just learn values. They learn sequences, delays, recency, and duration.
A fraud model may learn that a high-value transfer five minutes after a password reset is suspicious. A churn model may learn that a customer who has not logged in for 21 days and has reduced purchase frequency is at risk. A credit model may learn that the gap between account opening and first default is predictive.
All of these features depend on one assumption: events happen in a logically valid order.
When synthetic data breaks that order, three types of failures show up:
- Child events occur before parent events, like a transaction dated before the account was opened.
- Recency features become nonsense, like “days since last login” being negative.
- Time-window aggregations behave unrealistically because the generated users all live on the same compressed timeline.
These bugs are hard to catch visually because every individual timestamp may look plausible. The problem only appears when you compare timestamps across tables or across event sequences.
What Temporal Consistency Actually Means
In practice, temporal consistency has three layers.
First, event ordering. Parent records must exist before dependent records. A customer signs up before opening an account. An account opens before its first transaction. A transaction happens before a refund tied to that transaction.
Second, duration realism. The gaps between events should resemble production behavior. If real users usually make a second purchase 10 to 30 days after signup, your synthetic users should not all make it exactly 2 days later.
Third, sequence structure. The dataset should preserve larger patterns like seasonality, weekly cycles, and user aging. A database where every customer was created in the same month may pass schema validation and still fail every model that depends on tenure.
If any one of these layers breaks, feature engineering starts producing distorted signals.
Step 1: Generate a Relational Dataset
We will start with a simple three-table setup: customers, accounts, and transactions.
python
import pandas as pd
import numpy as np
from faker import Faker
from datetime import datetime, timedelta
fake = Faker(‘en_IN’)
np.random.seed(42)
def generate_customers(n=1000):
start = datetime(2021, 1, 1)
end = datetime(2025, 12, 31)
span = (end — start).days
signup_dates = [
start + timedelta(days=int(np.random.randint(0, span)))
for _ in range(n)
]
return pd.DataFrame({
‘customer_id’: [f’CUST{str(i).zfill(6)}’ for i in range(1, n + 1)],
‘signup_date’: signup_dates,
‘segment’: np.random.choice([‘free’, ‘standard’, ‘premium’], size=n, p=[0.5, 0.35, 0.15])
})
def generate_accounts(customers_df):
rows = []
counter = 1
for _, customer in customers_df.iterrows():
if np.random.random() < 0.03:
continue
n_accounts = max(1, np.random.poisson(1.4))
for _ in range(n_accounts):
days_since_signup = (datetime(2026, 1, 1) — customer[‘signup_date’]).days
opened_date = customer[‘signup_date’] + timedelta(
days=int(np.random.randint(0, max(1, days_since_signup)))
)
rows.append({
‘account_id’: f’ACC{str(counter).zfill(8)}’,
‘customer_id’: customer[‘customer_id’],
‘opened_date’: opened_date,
‘account_type’: np.random.choice([‘wallet’, ‘savings’, ‘credit’], p=[0.4, 0.4, 0.2])
})
counter += 1
return pd.DataFrame(rows)
def generate_transactions(accounts_df):
rows = []
counter = 1
for _, account in accounts_df.iterrows():
if np.random.random() < 0.07:
continue
n_txns = max(1, min(np.random.negative_binomial(2, 0.2), 150))
days_active = (datetime(2026, 3, 1) — account[‘opened_date’]).days
for _ in range(n_txns):
txn_date = account[‘opened_date’] + timedelta(
days=int(np.random.randint(0, max(1, days_active)))
)
rows.append({
‘transaction_id’: f’TXN{str(counter).zfill(10)}’,
‘account_id’: account[‘account_id’],
‘customer_id’: account[‘customer_id’],
‘transaction_date’: txn_date,
‘amount’: round(np.random.lognormal(6, 1.3), 2)
})
counter += 1
return pd.DataFrame(rows)
customers_df = generate_customers(1000)
accounts_df = generate_accounts(customers_df)
transactions_df = generate_transactions(accounts_df)
print(f”Customers: {len(customers_df)}”)
print(f”Accounts: {len(accounts_df)}”)
print(f”Transactions: {len(transactions_df)}”)
Output:
text
Customers: 1000
Accounts: 1398
Transactions: 19542
At this point, the dataset looks reasonable. That is exactly the trap.
Step 2: Introduce a Temporal Bug
To understand what temporal validation catches, it helps to break the data on purpose.
The following code injects a small number of impossible transaction dates by moving some transactions to a date before the account was opened.
python
def inject_temporal_violations(transactions_df, accounts_df, violation_rate=0.01):
txns = transactions_df.copy()
n_violations = int(len(txns) * violation_rate)
violated_idx = np.random.choice(txns.index, size=n_violations, replace=False)
account_open_dates = accounts_df.set_index(‘account_id’)[‘opened_date’].to_dict()
for idx in violated_idx:
account_id = txns.loc[idx, ‘account_id’]
opened_date = account_open_dates[account_id]
txns.loc[idx, ‘transaction_date’] = opened_date — timedelta(days=np.random.randint(1, 30))
return txns
broken_transactions_df = inject_temporal_violations(transactions_df, accounts_df, violation_rate=0.01)
print(f”Injected violations into {int(len(broken_transactions_df) * 0.01)} transactions”)
Output:
text
Injected violations into 195 transactions
Now we have a dataset that still looks normal at a glance but contains logically impossible histories.
Step 3: Validate Event Ordering
The first validation layer checks whether dependent events happen after their parent events.
python
def validate_event_order(customers_df, accounts_df, transactions_df):
print(“=” * 65)
print(“TEMPORAL ORDER VALIDATION”)
print(“=” * 65)
acc_merged = accounts_df.merge(
customers_df[[‘customer_id’, ‘signup_date’]],
on=’customer_id’,
how=’left’
)
invalid_accounts = acc_merged[acc_merged[‘opened_date’] < acc_merged[‘signup_date’]]
txn_merged = transactions_df.merge(
accounts_df[[‘account_id’, ‘opened_date’]],
on=’account_id’,
how=’left’
)
invalid_txns = txn_merged[txn_merged[‘transaction_date’] < txn_merged[‘opened_date’]]
print(f”Accounts opened before signup: {len(invalid_accounts):>6}”)
print(f”Transactions before account open: {len(invalid_txns):>6}”)
if len(invalid_accounts) == 0 and len(invalid_txns) == 0:
print(“Status: ✓ PASS”)
else:
print(“Status: ✗ FAIL”)
print(“=” * 65)
return invalid_accounts, invalid_txns
invalid_accounts, invalid_txns = validate_event_order(
customers_df,
accounts_df,
broken_transactions_df
)
Output:
text
=================================================================
TEMPORAL ORDER VALIDATION
=================================================================
Accounts opened before signup: 0
Transactions before account open: 195
Status: ✗ FAIL
This is the simplest check, and it already catches a failure that would quietly poison your feature pipeline.
Step 4: Validate Time-Window Features
Now we test the features that are most likely to break in production: rolling-window aggregates.
If your synthetic database compresses timelines unrealistically, time-window features become distorted even when event ordering is technically valid.
python
def build_customer_features(customers_df, accounts_df, transactions_df, ref_date=’2026–03–01'):
accounts_per_customer = accounts_df.groupby(‘customer_id’).agg(
num_accounts=(‘account_id’, ‘nunique’),
first_account_date=(‘opened_date’, ‘min’)
).reset_index()
txns = transactions_df.copy()
txns[‘days_from_ref’] = (pd.to_datetime(ref_date) — txns[‘transaction_date’]).dt.days
txn_features = txns.groupby(‘customer_id’).agg(
total_transactions=(‘transaction_id’, ‘count’),
avg_amount=(‘amount’, ‘mean’),
spend_last_30_days=(‘amount’, lambda x: x[txns.loc[x.index, ‘days_from_ref’] <= 30].sum()),
txn_last_30_days=(‘transaction_id’, lambda x: (txns.loc[x.index, ‘days_from_ref’] <= 30).sum())
).reset_index()
features = customers_df.merge(accounts_per_customer, on=’customer_id’, how=’left’)
features = features.merge(txn_features, on=’customer_id’, how=’left’)
features[‘account_age_days’] = (
pd.to_datetime(ref_date) — pd.to_datetime(features[‘first_account_date’])
).dt.days
return features
features_df = build_customer_features(customers_df, accounts_df, broken_transactions_df)
print(features_df[[‘num_accounts’, ‘total_transactions’, ‘spend_last_30_days’, ‘account_age_days’]].describe())
Output:
text
num_accounts total_transactions spend_last_30_days account_age_days
count 970.000000 906.000000 906.000000 970.000000
mean 1.441237 21.569536 1128.224635 856.372165
std 0.723812 18.904244 1986.905827 481.613906
min 1.000000 1.000000 0.000000 1.000000
25% 1.000000 8.000000 75.810000 466.250000
50% 1.000000 16.000000 402.710000 857.500000
75% 2.000000 29.000000 1229.982500 1259.000000
max 5.000000 120.000000 17091.760000 1882.000000
These numbers look reasonable. That is important. Temporal bugs often hide inside apparently normal distributions.
Step 5: Check Sequence Structure
A good synthetic database does not just preserve valid ordering. It should also preserve higher-level temporal behavior. One useful check is autocorrelation in daily transaction volume.
If the real system has weekly usage cycles and the synthetic system does not, your model will be trained on the wrong rhythm.
python
from statsmodels.tsa.stattools import acf
def daily_transaction_series(transactions_df):
daily = transactions_df.groupby(‘transaction_date’).size().sort_index()
full_range = pd.date_range(daily.index.min(), daily.index.max(), freq=’D’)
daily = daily.reindex(full_range, fill_value=0)
return daily
def compare_autocorrelation(real_series, synthetic_series, nlags=14):
real_acf = acf(real_series, nlags=nlags, fft=True)
synth_acf = acf(synthetic_series, nlags=nlags, fft=True)
diff = np.abs(real_acf — synth_acf)
print(“=” * 65)
print(“AUTOCORRELATION COMPARISON”)
print(“=” * 65)
print(f”Max ACF deviation: {diff.max():.4f}”)
print(f”Mean ACF deviation: {diff.mean():.4f}”)
print(f”Status: {‘✓ PASS’ if diff.max() < 0.10 else ‘✗ FAIL’}”)
print(“=” * 65)
comparison = pd.DataFrame({
‘lag’: range(len(real_acf)),
‘real_acf’: real_acf,
‘synthetic_acf’: synth_acf,
‘abs_diff’: diff
})
return comparison
real_like_series = daily_transaction_series(transactions_df)
broken_series = daily_transaction_series(broken_transactions_df)
acf_comparison = compare_autocorrelation(real_like_series, broken_series)
print(acf_comparison.head(10))
Output:
text
=================================================================
AUTOCORRELATION COMPARISON
=================================================================
Max ACF deviation: 0.0312
Mean ACF deviation: 0.0114
Status: ✓ PASS
This tells us something useful: the broken dataset failed event ordering, but its aggregate daily rhythm still looks close to the original. That means we have isolated a local temporal bug rather than a global time-structure collapse.
That distinction matters. Different failures require different fixes.
Step 6: Repair Temporal Violations
The simplest repair is not always the best repair, but it is useful as a baseline. For impossible transactions, clamp the transaction date so it cannot occur before account opening.
python
def repair_transaction_dates(transactions_df, accounts_df):
txns = transactions_df.copy()
open_dates = accounts_df.set_index(‘account_id’)[‘opened_date’].to_dict()
for idx, row in txns.iterrows():
opened_date = open_dates[row[‘account_id’]]
if row[‘transaction_date’] < opened_date:
txns.at[idx, ‘transaction_date’] = opened_date + timedelta(days=np.random.randint(0, 7))
return txns
repaired_transactions_df = repair_transaction_dates(broken_transactions_df, accounts_df)
_, repaired_invalid_txns = validate_event_order(
customers_df,
accounts_df,
repaired_transactions_df
)
Output:
text
=================================================================
TEMPORAL ORDER VALIDATION
=================================================================
Accounts opened before signup: 0
Transactions before account open: 0
Status: ✓ PASS
===============================================================The repaired dataset is now logically valid. More importantly, the fix is measurable. You can validate the change instead of trusting it.
What to Test Before Production
If your ML pipeline uses time-aware features, your synthetic database should pass all of the following checks before you trust it:
- No child event occurs before its parent event.
- Account age spans beyond the largest rolling window used in features.
- Recency features do not produce negative values.
- Daily or weekly transaction patterns preserve basic autocorrelation structure.
- Long-tenure and short-tenure users are both represented.
- Sparse users and highly active users both exist in the data.
Without these checks, a synthetic database can be statistically plausible and still be temporally impossible.
Where This Fails in Practice
Temporal consistency is one of those things that teams believe they have handled because they generated timestamps. That is not enough.
The hardest cases are not simple parent-child violations. They are edge cases like:
- users with multiple accounts opened years apart,
- seasonality effects that disappear in generated data,
- synthetic users whose activity is too evenly spread over time,
- time windows that collapse because every synthetic customer was created recently.
Those bugs do not show up in row-level QA. They show up when you compute features and compare distributions across time segments.
That is why temporal validation belongs in the synthetic data generation process itself, not as an afterthought.
The Bottom Line
A synthetic database that breaks time will eventually break your model.
Not immediately, and not always in obvious ways. It will happen through rolling-window features, recency logic, tenure buckets, or event sequence patterns that quietly drift away from reality. By the time the model fails, the root cause will look like a training bug or a feature bug. In reality, it started much earlier, when impossible timelines were allowed into the dataset.
If your model cares about time, your synthetic database must care about time first.
Generate timestamps in order. Validate event sequences. Check temporal structure. Then trust the features.
Anything less is testing your ML system on a timeline that never existed.
Temporal Consistency in Synthetic Databases: The Silent Failure That Breaks Time-Aware ML Models was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.