INQUIRING LINE

Can a rejected-edit buffer work like hard negatives in contrastive learning?

This explores whether a stored collection of edits a user rejected could serve as 'hard negatives' — the deliberately-close-to-correct wrong examples that contrastive learning uses to sharpen a model's sense of what counts as right.


This reads the question as asking whether rejected edits — changes a user looked at and turned down — can do the work that hard negatives do in contrastive training: not just any wrong answer, but the near-misses that force a model to draw a sharp boundary. The corpus suggests the idea is sound in principle but comes with a specific, well-documented cost. The most direct support is the finding that negative signal alone can carry training: suppressing incorrect trajectories with only negative samples matches or beats full reinforcement learning, and crucially it preserves diversity rather than collapsing the model onto a few high-reward outputs Does negative reinforcement alone outperform full reinforcement learning?. A rejected-edit buffer is exactly this kind of resource — a stream of 'don't do this' signal that doesn't require you to also produce the gold answer.

But the catch is geometric, and the corpus names it precisely. When dense-retrieval systems are trained on structure-targeted hard negatives — negatives chosen to be confusably close to positives — discrimination on the targeted distinction improves only partially while zero-shot generalization drops 8–40% Does training for compositional sensitivity hurt dense retrieval?. The takeaway for a rejected-edit buffer: the harder and more specific your negatives, the more you risk teaching the model the contours of *those particular rejections* rather than a transferable sense of quality. Hard negatives sharpen a boundary and warp the surrounding space at the same time.

There's also a subtler trap. A rejected edit is only a useful negative if the *reason* it was rejected is the thing you want the model to learn. Work on heuristic-override tasks shows that simply removing or down-weighting a 'bad' cue can hurt, because the real failure is integrating conflicting signals, not filtering one out Why does removing spurious cues sometimes hurt model performance?. A rejection often bundles many reasons together; treated as an undifferentiated negative, it can push the model away from things that were actually fine.

This is why the corpus also points toward an alternative to the pure contrastive framing. Rather than treating a rejection as a negative to push away from, you can transform it into a positive target: few-shot prompting reliably converts negative feedback like 'this doesn't work for me' into a retrievable positive preference, no fine-tuning required Can language models bridge the gap between critique and preference?. The reframing literature reinforces that this conversion can preserve the underlying meaning instead of just flipping polarity Does positive reframing preserve meaning better than sentiment transfer?. So the same rejected-edit buffer has two lives: as hard negatives, or as a source of inferred positive intent.

The most interesting place this lands is verification. The deepest payoff of hard negatives isn't in the base model at all — it's in training a downstream judge to catch the near-misses. A two-stage pipeline where a small verifier operates on full token-interaction patterns reliably rejects 'structural near-misses' that cheaper similarity methods wave through Can verification separate structural near-misses from topical matches?. A rejected-edit buffer is a ready-made training set of exactly those near-misses. So the sharper answer to the question may be: yes, but point them at a verifier that learns to *flag* bad edits, rather than at the generator you're hoping will stop producing them — that sidesteps the generalization tax the retrieval work warns about.


Sources 6 notes

Does negative reinforcement alone outperform full reinforcement learning?

Training with only negative samples consistently improves Pass@k across the spectrum, often matching full PPO and GRPO. Negative reinforcement suppresses incorrect trajectories while preserving diversity, whereas positive-only reinforcement degrades higher-k performance by concentrating probability mass.

Does training for compositional sensitivity hurt dense retrieval?

Adding structure-targeted negatives to dense retrieval training consistently degrades zero-shot performance (8-40% nDCG@10 drop) while only partially improving compositional discrimination. This is a geometric trade-off in high-dimensional cosine spaces, not a tuning problem.

Why does removing spurious cues sometimes hurt model performance?

Removing spurious cues degrades performance in heuristic override tasks, opposite to shortcut learning predictions. The failure mode is integrating conflicting signals rather than ignoring distractors—a frame problem, not feature selection.

Can language models bridge the gap between critique and preference?

Few-shot LLM prompting can convert natural negative feedback like "doesn't look good for a date" into positive preferences like "prefer more romantic," enabling retrieval systems to find better-matching recommendations without fine-tuning.

Does positive reframing preserve meaning better than sentiment transfer?

The POSITIVE PSYCHOLOGY FRAMES benchmark demonstrates that reframing neutralizes negativity while keeping original content intact, whereas sentiment transfer reverses both polarity and meaning. Reframing is semantically constrained and requires genuine understanding of complementary perspectives.

Can verification separate structural near-misses from topical matches?

A two-stage pipeline—pooled-cosine recall followed by a small Transformer verifier operating on token-token similarity maps—reliably rejects structural near-misses that MaxSim-style late interaction cannot. The verifier succeeds because it operates on full token interaction patterns rather than compressed vectors.

Next inquiring lines