How does embedding table size grow as new user and item IDs arrive?

This explores what physically happens to a recommender's embedding table — the giant lookup of one vector per user and item — as a live platform keeps minting new IDs, and the design tradeoffs that forces.

This explores the embedding table — the lookup that stores one learned vector per user and per item — and what happens to it on a real platform where new IDs never stop arriving. The blunt answer from the corpus: in the way most published methods are built, the table doesn't gracefully grow at all. Most academic recommenders assume the users and items at test time were already seen during training — what Why do recommendation models fail when new users arrive? calls the transductive assumption. Real platforms are inductive: they must score people and products that didn't exist when the model was trained. So the question of 'how the table grows' is really a question of how you bolt new rows onto a structure that wasn't designed to receive them.

The industry workaround is to fix the table size up front and hash every incoming ID into a slot. That caps memory, but it quietly rots over time. As more IDs pour in, collisions pile up — and crucially they don't land evenly. Recommendation traffic follows a power law, so the heaviest users and most popular items are exactly the ones most likely to collide and get their signals smeared together (Do hash collisions really harm popular recommendation items?, Why do hash collisions hurt recommendation models so much?). The fixed-size hashed table fails precisely where accuracy matters most, and it gets worse the longer the system runs. The alternative — Monolith-style collisionless, dynamically expanding tables — lets the table genuinely grow with new IDs instead of overwriting, which is why production systems lean that way.

Here's the twist a curious reader might not expect: the size that matters isn't only the number of rows, it's the width of each vector. Does embedding dimensionality secretly drive popularity bias in recommenders? shows that if you shrink the per-ID dimension to save space, the model overfits toward popular items and niche items quietly starve — a fairness problem that compounds over time, just like collisions do. And Do embedding dimensions fundamentally limit retrievable document combinations? proves there's a hard ceiling: for any fixed dimension, there's a maximum number of distinct top-k result combinations the embeddings can even represent. So 'grow the table' has two axes — more rows for new IDs, wider rows for expressive power — and both push against memory budgets.

That tension is why a chunk of the corpus tries to sidestep per-ID growth entirely. Instead of minting a fresh vector for every new item, you can map items through a small shared codebook: Can discretizing text embeddings improve recommendation transfer? and Can discrete codes transfer better than text embeddings? use product quantization so a new item reuses learned codes rather than demanding a brand-new row — the table stops scaling one-to-one with the catalog. A related move is to stop storing fat per-user vectors at all: How can user vectors capture diverse interests without exploding in size? computes interest on the fly against each candidate, so a user's representation isn't a static table entry that has to grow.

The thing worth walking away with: 'how does the table grow' looks like a storage question but the corpus keeps reframing it as a design choice. You can let it grow honestly (collisionless dynamic tables), let it grow dishonestly (fixed hashed tables that decay), or refuse to let it grow per-ID at all (shared discrete codes, candidate-conditional user vectors) — and each choice quietly decides who gets served well as the platform ages.

Sources 8 notes

Why do recommendation models fail when new users arrive?

Published recommendation methods assume training-test overlap (transductive learning), but real platforms require inductive learning to score unseen users and items continuously. Feature-based and aggregation approaches exist but face limitations like directional bias and unavailable features.

Do hash collisions really harm popular recommendation items?

Real recommendation IDs follow power-law distributions, not uniform ones. High-frequency users and items collide more often, degrading model quality exactly where traffic is highest, making fixed-size hash tables inadequate for production systems.

Why do hash collisions hurt recommendation models so much?

Monolith's empirical work shows that real recommendation systems have power-law distributed frequencies, causing collisions to accumulate precisely on the entities models need most accurate. Fixed-size hashed tables worsen this over time as new IDs arrive.

Does embedding dimensionality secretly drive popularity bias in recommenders?

Research shows that when user/item embedding dimensions are too small, recommender systems overfit toward popular items to maximize ranking quality. This compounds over time as niche items receive insufficient exposure, and cannot be fixed post-hoc without treating dimensionality as a fairness hyperparameter.

Do embedding dimensions fundamentally limit retrievable document combinations?

Communication complexity theory proves that for any embedding dimension d, there exists a maximum number of top-k document combinations that can be returned as results. Even embeddings optimized directly on test data hit this polynomial limit, demonstrated on trivially simple retrieval tasks.

Can discretizing text embeddings improve recommendation transfer?

VQ-Rec uses product quantization to map item text to discrete codes that index learned embeddings, breaking the tight coupling between text and recommendations. This decoupling prevents text-similarity bias and allows lookup tables to adapt to new domains without retraining the text encoder.

Can discrete codes transfer better than text embeddings?

VQ-Rec demonstrates that mapping item text to discrete codes via product quantization, then to embeddings, improves cross-domain transfer compared to direct text encoding. The discrete intermediate reduces text bias and enables efficient per-domain fine-tuning.

How can user vectors capture diverse interests without exploding in size?

Deep Interest Network weights historical behaviors against each candidate ad, activating only relevant interests dynamically. This preserves dimension efficiency while expressing diverse tastes without lossy compression.

How does embedding table size grow as new user and item IDs arrive?

Sources 8 notes

Next inquiring lines