Why do dialogue systems need probabilistic reasoning?
Explores whether deterministic flowchart-based dialogue systems can handle realistic speech recognition error rates of 15-30 percent, and what alternative approaches might be necessary.
Speech-driven conversation requires different architectural choices than text due to acoustic noise, latency cascades, and unmeasured reasoning capabilities.
Explores whether deterministic flowchart-based dialogue systems can handle realistic speech recognition error rates of 15-30 percent, and what alternative approaches might be necessary.
Voice assistants traditionally convert speech to text before responding. Does eliminating that middle step reduce latency enough to matter for real-time conversation?
Exploring whether self-supervised speech models encode phonetic categories tied to specific languages or instead capture the underlying vocal-tract physics common to all humans. This matters for understanding why these models transfer across languages without retraining.
Speech evaluation has strong benchmarks for transcription and translation, but broader comprehension and reasoning tasks over audio lack standardized measurement. This gap may constrain which capabilities researchers prioritize building.