Why do standard supervised models miss high-order connectivity in recommendations?

This explores why models trained to predict observed user-item interactions directly tend to capture only first-order signal — and miss the multi-hop, transitive relationships (user→item→attribute→other item) that graph-based methods call 'high-order connectivity.'

This is really a question about what a training objective can and can't see. A standard supervised recommender learns to predict the interactions it was shown — this user clicked this item. That captures *first-order* signal beautifully, but it never forces the model to reason about chains: users who liked A also liked B, B shares an attribute with C, so this user might want C. Those multi-hop paths are exactly the 'high-order connectivity' that Can graphs unify collaborative filtering and side information? is built to recover — KGAT merges the user-item graph with an item knowledge graph and uses attention-based propagation across hops, explicitly noting that standard supervised learning misses these transitive connections because nothing in its loss propagates signal along the graph.

The interesting twist is that you don't always need a graph neural network to fix this — you need the right structural prior. The linear autoencoder work, Can simpler models beat deep networks for recommendation systems? and its sibling Can a linear model beat deep collaborative filtering?, beats deep models with a single item-item weight matrix whose diagonal is constrained to zero. By forbidding an item from predicting itself, the constraint *forces* every prediction to route through item-to-item relationships — and the learned negative weights capture which items repel each other. Their lesson is blunt: the structural bias mattered more than model capacity. A high-capacity supervised model with no such prior will happily fit direct co-occurrence and stop there.

So why do plain supervised models stop there? Partly because their default pressure runs the other way — toward collapse, not propagation. Does embedding dimensionality secretly drive popularity bias in recommenders? shows that when embedding dimensions are small, the ranking objective overfits toward popular items, drowning out the niche connections that high-order paths would surface. And Why do hash collisions hurt recommendation models so much? shows the representation layer itself degrades on exactly the high-frequency entities the model needs most. Both are cases where the model's structure quietly flattens the relationship graph instead of traversing it.

The constructive answers in the corpus all add a channel the bare supervised objective lacks. Can autoencoders solve the cold-start problem in recommendations? uses graph features plus autoencoders to find non-linear relationships that linear hybrids miss — and as a bonus solves cold-start, because high-order paths through side information reach items with no direct interaction history. That cold-start payoff is the tell: high-order connectivity isn't a luxury, it's what lets a system say something about an item it has never seen anyone click.

The thing worth walking away with: 'high-order connectivity' isn't a fancier version of accuracy — it's a fundamentally different thing to optimize for. A model is only as relational as its objective and its structure force it to be. You can buy that relational reasoning with a graph and attention (KGAT), or astonishingly cheaply with a single well-chosen constraint (EASE) — but you almost never get it for free from raw supervised fitting on observed clicks.

Sources 6 notes

Can graphs unify collaborative filtering and side information?

KGAT merges user-item interaction graphs with item knowledge graphs into a Collaborative Knowledge Graph, using attention-based propagation to capture both user-similarity and attribute-similarity signals simultaneously—including high-order connections that standard supervised learning methods miss.

Can simpler models beat deep networks for recommendation systems?

EASE, a shallow linear item-item weight matrix with diagonal constrained to zero, beats deep neural baselines on most datasets. The constraint forces generalization by forbidding self-prediction, while learned negative weights capture item dissimilarity—a structural prior more valuable than model capacity.

Can a linear model beat deep collaborative filtering?

ESLER, a single-layer linear autoencoder constrained so items cannot predict themselves, outperforms most deep CF models. The constraint forces prediction through item relationships, and negative weights encoding anti-affinity prove essential—structural bias matters more than model capacity.

Does embedding dimensionality secretly drive popularity bias in recommenders?

Research shows that when user/item embedding dimensions are too small, recommender systems overfit toward popular items to maximize ranking quality. This compounds over time as niche items receive insufficient exposure, and cannot be fixed post-hoc without treating dimensionality as a fairness hyperparameter.

Why do hash collisions hurt recommendation models so much?

Monolith's empirical work shows that real recommendation systems have power-law distributed frequencies, causing collisions to accumulate precisely on the entities models need most accurate. Fixed-size hashed tables worsen this over time as new IDs arrive.

Can autoencoders solve the cold-start problem in recommendations?

GHRS uses graph features and deep autoencoders to integrate rating history with side information, enabling predictions for new users and items by discovering non-linear relationships that linear hybrid methods miss.

Why do standard supervised models miss high-order connectivity in recommendations?

Sources 6 notes

Next inquiring lines