Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes

Paper · arXiv 2308.06095 · Published August 11, 2023

“In this paper, we attempt to systematise the literature about the attested problems of neural conversation models (conditional language models realised with neural networks) used as chat-partner simulators, and approaches to addressing these. We show that these approaches can be framed as attempts to “tame” the underlying model, in the sense that they impose further constraints on it. For this, we identify different targets or intervention points, such as data, training regime, decoding, etc. We show that the constraints can be related to observations from the field of pragmatics, and more specifically to Grice’s maxims of cooperative conversation (Grice, 1975).”

“Pre-trained language models are strong tools in the field of neural generation models, but unsupervised training on pure dialogues can only solve one aspect of a good artificial dialogue partner: they are able to make dialogues fluent. All other aspects require labelled high quality datasets and strong automated metrics for training and evaluation. We noticed an emerging trend away from pure end-to-end training (i.e., building chatbots) to more specific solutions for different dialogue aspects, such as diversification of generated words, learning the utilisation of general knowledge or using consistent personality traits. We support that trend, as human language and specifically dialogue, has a high complexity, and therefore the generalisation effort that neural networks have to make is also high. We want to stress one particular issue in the field, namely the lack of comparability: The aforementioned trend to very specific solutions leads to many different datasets that are used to train models and that are not comparable. Additionally, the lack of available metrics enforces the use of human evaluations or self-defined metrics, which are also not comparable among each other. That makes it hard to quantify the reining effect of different approaches, therefore comparability should receive more attention in future work.”