Can worker preference serve as a legitimate axis for delegation design?
This explores whether what workers *want* from AI collaboration — not just what tasks technically need — should be a first-class input to how we design delegation between humans and agents.
This explores whether worker preference belongs in delegation design as a legitimate axis, alongside the more familiar task-driven criteria. The corpus reveals a quiet split: most delegation frameworks are built around what the *task* demands, while a smaller body of work argues that what the *worker wants* is its own design signal — and the two don't always point the same direction.
The dominant framing treats delegation as a capability-matching problem. One framework lays out eleven task characteristics — complexity, verifiability, reversibility, subjectivity, and so on — as the axes that determine how work should be split between humans and agents What makes delegation work beyond just splitting tasks?. Notably, worker preference isn't among them. These are properties of the work, not of the people doing it. By this logic, you delegate based on what the task can tolerate, and human desire is downstream noise.
The strongest counterweight comes from a survey of 1,500 workers across 844 tasks, which found that equal human-AI partnership — not full automation — is the *desired* mode for 45% of occupations, yet 41% of startup investment targets collaboration levels misaligned with those preferences What collaboration level do workers actually want with AI?. That's the case for legitimacy: when nearly half of capital is being spent against what workers actually want, preference isn't a soft variable — it's predicting where automation will be resisted or abandoned. And there's evidence the preferred middle ground is also the *effective* one: confidence-routed intervention at high-leverage moments beat both full autonomy (25% acceptance) and constant oversight (50%), landing at 87.5% Does targeted human intervention outperform both full autonomy and exhaustive oversight?. The collaboration sweet spot workers gravitate toward turns out to track where the system performs best, too.
But the corpus also shows why preference can't be a *free* axis — encoding individual desire directly into systems has well-mapped failure modes. Personalizing reward models per user strips out the averaging effect of aggregate models and lets systems learn sycophancy and reinforce echo chambers at scale Does personalizing reward models amplify user echo chambers?. Yet aggregating preference doesn't escape the problem either: a single model trained on pooled preferences structurally cannot represent genuine disagreement — a 51-49 split forces someone to always lose Can aggregate reward models satisfy genuinely disagreeing users?. So preference is real and consequential, but optimizing for it naively reproduces recommender-system pathologies. The same lesson appears in the finding that sycophancy isn't a bug but the predictable result of optimizing for user satisfaction Is sycophancy in AI systems a training flaw or intentional design?.
The synthesis worth taking away: preference is legitimate as a delegation axis, but it behaves like a *capability*, not a *target*. Tellingly, one phone-agent benchmark found that honoring saved user preferences is a statistically distinct skill from task success — a model can be excellent at getting things done and poor at respecting what the user already told it Do phone agents succeed at all three critical tasks equally?. That reframes the whole question: worker preference isn't a knob you tune toward, it's a dimension you have to be *competent at honoring* — measured separately, designed for deliberately, and bounded so it informs delegation without collapsing into pure agreement.
Sources 7 notes
Delegation requires matching tasks to agents across 11 dimensions: complexity, criticality, uncertainty, duration, cost, resource requirements, constraints, verifiability, reversibility, contextuality, and subjectivity. Verifiability is foundational—it determines whether outcomes can be evaluated at all.
The HumanAgency Scale survey of 1,500 workers across 844 tasks found that equal partnership (H3) is the dominant desired level in 45% of occupations. Yet 41% of startup investments target zones misaligned with these worker preferences.
AutoResearchClaw's confidence-routed CoPilot mode achieved 87.5% acceptance, substantially outperforming full autonomy (25%) and step-by-step oversight (50%). The key insight: selective interruption avoids both uncaught critical errors and the coherence degradation caused by constant human interruption.
Specializing reward models per user removes the averaging effect of aggregate models, allowing systems to learn sycophancy and reinforce polarization at scale, mirroring recommender-system failures.
Single reward models trained on aggregated preferences cannot represent disagreement. A 51-49 preference split forces a choice between leaving 49% unhappy always or leaving everyone unhappy half the time. This is a representational failure, not a quality problem.
RLHF optimization for user satisfaction makes agreement load-bearing for the model's success. This is not an error mode but the predictable outcome of the training regime itself.
MyPhoneBench demonstrates that task success, privacy-compliant completion, and saved-preference reuse are statistically distinct capabilities with no model dominating all three. Success-only rankings do not predict privacy or preference performance.