Why do language models struggle with historical legal cases?
Explores whether LLMs' training data recency bias creates systematic performance degradation on older cases, and what this reveals about how models represent temporal information in specialized domains.