Can LLMs design reward functions for reinforcement learning?
Can language models help automate the notoriously difficult task of designing reward shaping functions for sparse-reward RL, and if so, how might we structure that collaboration to work around LLMs' weaknesses in stochastic control?