Design for divergence on day one
Most schemas don’t fail because they’re wrong. They fail because they assume entities stay singular.
Variants collapse identity. Prices collapse time. Stock collapses location. Most schema migrations are attempts to put those dimensions back.
The team opens a second warehouse to handle northern demand. The day-one schema fit one warehouse fine: products.stock
as one column, one number, decremented on every order, incremented on every restock. There’s nowhere to put “how much
in London, how much in Manchester”. Every read of stock (checkout availability, search filters, the restock dashboard)
assumes one number. The new warehouse can’t ship a single order until that assumption breaks across all of them. What
was a procurement decision becomes a quarter-long inventory rewrite.
Stock wasn’t singular. It was stock per location. The schema just hadn’t admitted it yet.
The cost asymmetry is unforgiving: an hour on day one, a quarter and a season of bugs on day 60. Every coalesced column is a bet that no future reader will care about a different time, region, or context.
Products are families, not rows
The first product schema almost writes itself: products(id, name, price, sku, stock). It’s wrong from the first
variant. A T-shirt isn’t a product. It’s a family of variants (sizes, colours, materials) that share a name and not
much else. Stock is per-variant. Price differs across variants more often than the catalogue page admits. The “sold
out” state on the product page is a function of every variant’s stock, not the product’s.
Once you accept that products are families, the cascade is unforgiving. Cart items reference variants, not products. Order items snapshot the variant chosen, not the parent SKU. Search facets are variant attributes that the catalogue groups by product. Images are often per-variant when colour matters and shared when it doesn’t. Bundles compose variants into virtual SKUs with their own stock rules. Subscriptions bind to a variant with a cadence and a renewable entitlement.
-- day-one schema, wrong from the first variant
products(id, name, price, sku, stock)
-- the shape that scales
products(id, name, description)
variants(id, product_id, sku, size, colour, price, stock)
order_items(id, order_id, variant_id, unit_price, quantity)
The product holds what the family shares. Variants hold what makes each instance distinct. Order items snapshot the variant at the moment of purchase, so historical orders survive any catalogue change.
A schema that bolts variants onto a product-equals-row model adds a variants table sideways and threads variant_id
through every code path that already assumed product-equals-SKU. Stock disagrees with checkout. Search facets show
counts that don’t match the cart. The warehouse picks the wrong size because checkout’s selected_variant was a
string, not a foreign key. The migration takes a quarter, and the team’s confidence in the schema’s edges takes longer
to recover.
The schema that scales treats variants as the canonical SKU and products as a grouping concept. Stock, price, and
identity live at the variant level. The product is a query: aggregate variants for the listing page, group facets for
search, present the family on the product page. It costs a foreign key on day one and saves the quarter on day 60.
Price is a history, not a column
Variants is multiplicity. Price is multiplicity over time, and it’s the second test most schemas fail. A single price
column has to satisfy the catalogue, the buyer who paid, the refund that’s owed, the analyst running a margin report,
and the auditor reconstructing what the price was when the offer fired. 5 readers, 5 different truths.
The schema either keeps a prices table with valid_from, valid_to, region, and tier, and computes the catalogue’s
“current price” as a query, or it pretends a column can be the truth and forces every reader past the first to run a
separate calculation that drifts. Refunds become approximations. Analytics and finance reconcile by hand. The audit
log can’t explain what happened.
The first instinct under pressure is to reach for a smart column: current_price cached on products, refreshed by a
job that reads prices. That solves the catalogue’s read latency. It changes nothing for anyone past it. The refund
still doesn’t know what the buyer paid. The auditor still can’t reconstruct the offer. A smart column is a slower bet
on the same singular truth.
Time and region compound the trap. VAT changes at midnight. The schema with prices(valid_from, valid_to, region, tier) rolls forward without intervention. Orders placed before midnight keep the old rate, orders after pick up the
new one, refunds reconstruct the rate the buyer paid. The schema with a vat_rate on products either runs a
migration that night or quietly charges the wrong tax until someone notices.
The fix on day one is the same shape variants demanded: snapshot what the buyer paid into order_items.unit_price at
order time, and let prices carry the catalogue’s view of what’s offered now. Storage is cheap. Lying about what
someone paid is not. Variants and prices are different shapes of the same trap: a column pretending to be the truth
when the entity is plural. That’s most columns. Not all of them.
Where simple wins
3 cases where collapsing to a single value is correct:
- Hot-path counters where events are the truth. Page views, add-to-cart events, search impressions. The aggregate is a cache built from the event log. Lose it and you can rebuild it from source. The single number is a derived value, not the truth. Designing it as a snapshot table adds storage for no reader who’d want it.
- Ephemeral state with no audit value. Session tokens, idempotency keys (markers that stop a payment from processing twice), CSRF nonces. Their entire lifetime is bounded by the request or session they belong to. Snapshotting past expiry is the bug, not the absence of history. The schema’s job is to forget them on time.
- Facts about completed transactions. The VAT charged on order #12345 stays “20% standard rate” forever, even if
the rate changes next year. The order itself is the temporal anchor. The order’s
created_atalready plays thevalid_fromrole, and the value attached to it never needs to change.
Everywhere else, the entity is plural and the schema has to admit it.
The question
Ask of every column whether it answers “what is X now” or “what was X then”. If both readers exist, the column is in the wrong shape.
The test catches products.price (refunds need what was paid), customers.tier (the order needs the tier at purchase
time), products.tax_rate (refunds need the rate at sale). It passes users.email_verified (no historical reader)
and sessions.token (ephemeral by design).
When the test catches something, the response is binary. Move the truth to a table with valid_from, snapshotting the
past where it mattered, or accept that the column is a cache and name it that way. Refusing to choose is what creates
the retrofit. The check costs a minute in review and saves the quarter in retrospect.
Design for divergence on day one
Variants, prices over time, stock across warehouses, addresses snapshot against orders, customer roles across organisations: most of the schema, most of the time, will need to admit that the entity isn’t singular. The asymmetry is the whole argument. An hour on day one. A quarter on day 60. Plan for variants before you stock a second size. Plan for prices over time before the first promotion. Plan for stock per location before you open a second warehouse. Not because requirements demand it, but because the alternative is asking the past to lie.
Plural entities don’t fit singular columns.